W-R134  209 
UNCLASSIFIED 


DESIGN  AND  ANALVSIS  OF  CHRONIC  AQUATIC  TESTS  OF  '  1/5 T 

TOXICITV<U)  OHIO  STATE  UNIV  RESEARCH  FOUNDATION 
COLUMBUS  P  I  FEDER  ET  AL  MAV  81  DAMD17-79-C-9150 

F/G  6/20  NL 


AD 


DESIGN  AND  ANALYSIS  OF  CHRONIC  AQUATIC  TESTS  OF  TOXICITY 


Final  Report 


®y 

Paul  I.  Feder 

Department  of  Statistics,  The  Ohio  State  University  and 
Applied  Statistics  Group,  Battelle  Columbus  Laboratories 

William  J.  CollinS 

Department  of  Entomology,  The  Ohio  State  University 
(15  September  1979  -  31  October  1980) 


Supported  by 

U.S.  Army  Medical  Research  and  Development  Command 
Fort  Detrick,  Frederick,  Maryland  21701 


Contract  No.  DAMD  17-79-C-9150 


FA 


DTIC 

?5EI.FCTE  ' 
i  NOV  2  1983 


The  Ohio  State  University  Research  Foundation 

1314  Kinnear  Road  ^**"^*^f 

Columbus,  Ohio  43212  i 


Contract  Officer's  Technical  Representative:  Paul  H.  Gibbs 
U.S.  Army  Medical  Bioengineering  Research  and  Development  Laboratory 
Fort  Detrick,  Frederick,  Maryland  21701 


Lul 

— I 

tx- 


Approved  for  public  release;  distribution  unlimited 


The  findings  in  this  report  are  not  to  be  construed  as  an  official 
Department  of  the  Army  position  unless  so  designated  by  other 

authorized  documents.  ^  ^  005 


83 


09 


C7 


CLASSIFICATION  OF  this  PAGE  (Wh on  Dete^Entered)^ 


REPORT  DOCUMENTATION  PAGE 


4.  TITlC  (•**  Subtitle) 

DESIGN  AND  ANALYSIS  OF  CHRONIC  AQUATIC  TESTS 
OF  TOXICITY 


T.  AUTHORfftJ 

Paul  I.  Feder 
William  J.  Collins 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


1.  RECIPIENT'S  CATALOG  NUMBER 


S.  TYPE  OF  REPORT  ft  PERIOD  COVERED 

Final  Report 
9/15/79  -  10/31/80 


6.  PERFORMING  ORG.  REPORT  NUMBER 

761860/712404 


ft.  contract  or MWSEKNUMBERCJ 


DAMD  1 7-7 9-C- 9150 


10.  PROGRAM  ELEMENT.  PROJECT,  TASK 
AREA  ft  WORK  UNIT  NUMBERS 


621102A. 3E161102BS04 . 
00.048 


•.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

The  Ohio  State  University  Research  Foundation 
1314  Kinnear  Road 
Columbus,  Ohio  43212 


II.  CONTROLLING  OFFICE  NAME  AND  ADDRESS  '2-  REPORT  DATE 

U.S.  Army  Medical  Research  and  Development  Command  May,  1981 
Medical  Bioengineering  Research  and  Development  Lab«3-  number  of  pages 
Fort  Detrick  Frederick,  Maryland  21701  415 


MONITORING  AGENCY  NAME  ft  ADDRESSfl/  dl Iterant  trom  Controlling  Ottlce)  IS.  SECURITY  CLASS,  (ol  Ihla  report ) 

Unclassified 

ISa.  DECLASSIFICATION/ DOWN  GRADING 
SCHEDULE 


16.  DISTRIBUTION  STATEMENT  (ol  thla  Report) 


Approved  for  public  release;  distribution  unlimited 


17.  DISTRIBUTION  STATEMENT  (of  the  ebstrmct  entered  in  Block  20,  if  different  from  Report) 


19.  KEY  WORDS  (Continue  on  teveree  side  it  neceeemry  end  Identify  by  block  num6«rj 


chronic  aquatic  toxicity  tests  multiple  comparisons 

standardized  data  reporting  dose  response  estimation 

tests  of  hypothesis  computer  programs 

confidence  intervals  statistical  analysis 

fathead  minnows 


20.  ABSTRACT  (Continue  on  reveree  aide  If  neceeeery  end  Identify  by  block  number) 

The  present  investigation  considers  aspects  of  the  planning  of,  reporting 
results  from,  and  the  statistical  analysis  of  data  arising  from  chronic  aquatic 
tests  of  toxicity  with  fathead  minnows.  The  discussion  emphasizes  analyses  and 
data  displays  for  the  interpretation  of  qualitative  mortality  and  abnormality 
data  and  quantitative  weight  data.  ' 

"  '"'The  report  consists  of  nineteen  sections.  Section  I-IV  describe  the 
literature  search,  argue  for  and  present  a  set  of  standardized  data  reportin 


FORM 
I  JAN  73 


EDITION  OF  I  NOV  «S  IS  OBSOLETE 


Unclassified _ 

SECURITY  CLASSIFICATION  of  THIS  PAGE  Dele  Entered) 


SECURITY  CLASSIFICATION  Of  THIS  PAOEfW*"  D«l«  Knitted) 


(Block  20  (Abstract)  -  continued 


Zf  formats,  and  contain  some  suggestions  for  further  improving  and  standardizing 
^  the  test  procedure.  Section  V  briefly  describes  the  early  life  stage  testing 
procedure  and  the  resulting  data.  Sections  VI-XIX  discuss  various  topics 
pertaining  to  the  statistical  analysis  of  such  data  and  to  statistical  aspects 
in  the  design  of  toxicity  tests.  Techniques  discussed  include  preliminary 
graphical  displays,  tests  for  homogeneity  among  tanks  within  treatment  groups, 
adjustments  to  account  for  tank  to  tank  heterogeneity  within  groups,  outlier 
detection  procedures,  overall  tests  for  treatment  effects,  multiple  comparisons 
and  pairwise  confidence  intervals  of  treatment  groups  with  the  control  group, 
analysis  of  variance,  regression  analysis  and  parametric  and  nonparametrlc  dose 
,  response  curve  fits  along  with  associated  estimates  of  "safe  concentrations". 
All  suggested  procedures  are  illustrated  with  real  examples  based  on  aquatic 
toxicity  data.  Several  newly  developed,  specialized  computer  programs  are 
described  to  carry  out  nonstandard  analyses^ 


Unclassified 


SECURITY  CLASSIFICATION  of  PAGEfH'han  Dal*  Enfrtd) 


DESIGN  AND  ANALYSIS  OF  CHRONIC  AQUATIC  TESTS  OF  TOXICITY 


Final  Report 


By 

Paul  I.  Feder 

Department  of  Statistics,  The  Ohio  State  University  and 
Applied  Statistics  Group,  Battelle  Columbus  Laboratories 

William  J.  Collins 

Department  of  Entomology,  The  Ohio  State  University 
(15  September  1979  -  31  October  1980) 


Supported  by 

U.S.  Army  Medical  Research  and  Development  Command 
Fort  Detrick,  Frederick,  Maryland  21701 


Contract  No.  DAMD  17-79-C-9150 


The  Ohio  State  University  Research  Foundation 
1314  Kinnear  Road 
Columbus,  Ohio  43212 

Contract  Officer's  Technical  Representative:  Paul  H.  Gibbs 
U.S.  Army  Medical  Bioengineering  Research  and  Development  Laboratory 
Fort  Detrick,  Frederick,  Maryland  21701 


Approved  for  i  blic  release;  distribution  unlimited 

The  findings  in  this  report  are  not  to  be  construed  as  an  official 
Department  of  the  Army  position  unless  so  designated  by  other 
authorized  documents. 

i 


EXECUTIVE  SUMMARY 


Purpose 

During  the  past  several  decades  problems  of  environmental  contami¬ 
nation  have  become  increasingly  important  both  from  the  scientific  and 
the  legal  standpoints.  In  recent  years  a  great  deal  of  attention  has 
been  directed  to  the  potential  toxicity  to  aquatic  organisms  of  chemicals 
discharged  into  water  bodies. 

The  U.S.  Army,  through  activities  such  as  munitions  manufacturing, 
operates  a  number  of  plants  that  produce,  consume,  or  discharge  a  variety 
of  chemical  substances.  Some  of  these  discharges  enter  bodies  of  water 
inhabited  by  various  aquatic  species.  Thus  the  Army  must  provide  the 
USEPA  with  safety  data  concerning  the  levels  of  such  discharges  and  the 
possible  extent  of  resulting  surface  water  and  ecological  contamination. 
In  order  to  develop  such  data  the  Army  conducts  both  intramural  and 
extramural  programs  of  aquatic  toxicity  testing. 

Considerable  amounts  of  time,  money,  and  manpower  are  expended 
by  the  Army  in  such  aquatic  toxicity  testing  programs.  To  make  these 
programs  more  efficient  and  more  effective,  the  need  has  been  felt  for 
a  reexamination  of  some  of  the  standard  methods  used.  This  has  been 
especially  true  of  statistical  methods  involved  in  the  design  of  testing 
programs  and  the  analysis  of  resulting  data.  This  study  is  an  effort 
to  make  some  progress  in  those  directions. 

The  results  in  this  study  indicate  areas  where  the  conduct  of  and 
the  summarization  and  reporting  of  results  from  chronic  aquatic  toxicity 
tests  can  be  further  standardized  and  made  easier  to  understand.  A 
number  of  the  statistical  approaches  and  procedures  discussed  and/or 
developed  in  this  study  have  not  to  the  authors'  knowledge  been  pre¬ 
viously  applied  to  aquatic  toxicity  data.  These  improved  methods  provide 
increased  information,  as  compared  with  standard  methodology,  about  the 
structure,  relations,  and  anomolies  in  the  data.  They  enhance  the  sensi¬ 
tivity  of  statistical  analyses,  so  that  greater  precision  of  results  can 
be  obtained  without  increasing  the  amount  of  testing.  In  brief  this 
study  provides  methods  that  should  improve  the  reporting  and  statistical 
analysis  of  data  from  chronic  aquatic  toxicity  tests.  This  will  enhance 
the  sensitivity  of  conclusions  that  can  be  derived  from  these  tests, 
thereby  increasing  their  efficiency. 


ii 


Approach 


All  the  statistical  procedures  discussed  In  this  report  are 
Illustrated  with  examples  based  on  real  data  from  chronic  tests  with 
fathead  minnows.  At  the  outset  of  the  study  the  principal  Investigators 
visited  the  USEPA  Environmental  Research  Laboratory  at  Duluth  to  become 
oriented  to  the  apparatus  and  procedures  used  In  chronic  toxicity  tests. 
Discussions  were  held  with  various  investigators  concerning  the  details 
of  their  studies.  Some  of  these  investigators  provided  us  with  illust¬ 
rative  data  to  be  used  in  our  subsequent  work. 

A  number  of  sets  of  experimental  data  were  received  from  Duluth, 
rhe  literature  pertaining  to  chronic  toxicity  tests  in  general  and  to 
those  tests  in  particular  was  reviewed  and  discussed  between  statistician 
and  toxicologist.  Based  on  an  understanding  of  experimental  procedures 
it  was  possible  to  then  start  considering  the  statistical  aspects  of  the 
problems.  The  statistical  procedures  discussed  in  the  body  of  this  re¬ 
port  represent  a  combination  of  methods  taken  from  the  statistical  litera¬ 
ture  where  appropriate  or  developed  especially  for  aquatic  toxicity 
testing  applications  where  standard  procedures  were  felt  not  to  be  the 
most  appropriate. 

Results 


Arguments  for  the  use  of  standardized  fish  stocks  and  standardized 
data  reporting  formats  are  presented.  Aspects  of  the  statistical  analysis 
of  toxicity  data  are  discussed  and  the  suggested  procedures  are  illust¬ 
rated  with  examples  based  on  fish  toxicity  studies.  Data  analysis  topics 
discussed  include:  graphical  displays,  preliminary  tests  of  tank  to  tank 
heterogeneity  within  treatment  groups,  preliminary  outlier  detection 
tests,  overall  tests  of  heterogeneity  in  response  rates  across  treatment 
groups,  treatment  group-control  group  pairwise  multiple  comparison  pro¬ 
cedures,  the  fitting  of  standard  and  nonstandard  dose  response  curve 
models,  analysis  of  variance  and  multiple  regression  analyses  on  quanti¬ 
tative  responses,  statistical  power  and  estimation  precision  to  be  ex¬ 
pected  for  various  levels  of  sample  size  and  suggestions  for  unequal 
allocation  of  experimental  effort  across  treatment  groups  with  greater 
effort  expended  on  the  control  group  and  lower  treatment  groups. 

Conclusions  and  Recommendations 


1.  The  USEPA  should  revise  and  update  the  standard  procedure  for  life 
cycle  tests  on  fathead  minnows. 

2.  Standardized  data  reporting  sheets  are  a  very  useful  adjunct  to 
the  categorization  and  analysis  of  chronic  toxicity  test  data. 


Guidelines  on  disposal  of  potentially  hazardous  effluent  from 
chronic  toxicity  tests  should  be  incorporated  in  the  procedure. 


Detailed  procedures  for  chemical  analysis  and  quality  assurance 
of  chemical  data  should  be  incorporated  in  the  procedure. 

Some  of  the  "standard"  methods  currently  used  for  analyzing  data 
from  aquatic  toxicity  tests  can  and  should  be  modified.  The  data 
should  first  be  graphed,  outlying  observations  or  groups  of 
observations  should  be  located  and  the  reason  for  their  aberrant 
behavior  determined,  and  tests  for  heterogeneity  among  tanks 
within  groups  should  be  carried  out.  Based  on  the  results  of 
these  preliminary  inferences,  the  data  should  be  modified  or 
adjusted  to  account  for  possible  heterogeneity  or  aberrant  values 
before  going  on  to  the  inferences  of  primary  interest. 

If  hypothesis  tests  are  to  be  used  to  compare  the  treatment  group 
and  control  group  responses  they  should  be  one  sided  tests  which 
are  sensitive  to  monotone  alternatives,  rather  than  overall 
analysis  of  variance  type  "shotgun"  tests. 

Multiple  comparison  procedures  and  confidence  intervals  pro¬ 
cedures  should  be  used  to  determine  specifically  which  treat¬ 
ment  groups  have  responses  which  differ  from  the  control  group 
responses  and  whether  the  differences  are  of  a  specified 
biological  significance.  Significance  tests,  by  themselves,  are 
not  adequate  to  define  an  "MATC"  (i.e.,  maximum  acceptable 
tolerable  concentration).  Perhaps  a  confidence  bound  should  be 
routinely  constructed  at  the  MATC  to  determine  just  how  much 
worse  than  the  control  group  the  response  at  that  concentration 
could  conceivably  be.  In  general,  confidence  intervals  impart 
much  more  information  than  hypothesis  tests  and  should  be 
routinely  used. 

A  good  way  to  place  monotone  response  structure  on  the  problem, 
to  smooth  the  data,  and  to  convert  a  hypothesis  testing  prob¬ 
lem  into  an  estimation  problem  is  to  fit  dose  response  curve 
models  to  the  data  and  to  define  the  "safe"  concentration  as 
that  which  results  in  no  more  than  a  specified  increment  in 
response  from  the  control  group.  A  number  of  nonstandard  vari¬ 
ants  on  the  "standard"  dose  response  models  discussed  in  the 
literature  may  be  useful.  A  nonparametric  approach  to  dose 
response  estimation  is  feasible,  has  been  implemented  in  a 
computer  program,  and  may  be  preferable  on  occassion  to  some 
of  the  standard  parametric  dose  response  models. 


9.  Statistical  power  and  estimation  precision  depend  both  on  the 

number  of  tanks  run  per  group  and  the  number  of  fish  per  tank.  In 
the  presence  of  substantial  tank  to  tank  heterogeneity  the  effective 
sample  size  may  be  more  nearly  the  number  of  tanks  than  the  number 
of  fish.  Thus  in  the  presence  of  tank  to  tank  heterogeneity,  dimin¬ 
ishing  returns  result  from  increasing  the  number  of  fish  used  with¬ 
out  also  increasing  the  number  of  fish  tanks  per  group. 

10.  Under  certain  circumstances  it  is  sensible  to  allocate  experimental 
resources  so  that  the  control  group  and  lower  concentration  groups 
receive  more  tanks  and  fish  than  the  higher  concentration  groups. 
This  results  in  greater  inference  sensitivity  in  the  region  of  the 
MATC.  Proportional  diluters  should  be  modified  to  permit  such 
asymmetrical  allocations  of  tanks,  at  the  discretion  of  the  investi¬ 
gator. 

11.  Statistical  power  or  statistical  precision  goals  should  be  stated 
as  part  of  the  protocol  for  each  individual  toxicity  test  and 
sample  sizes  should  be  determined  accordingly. 
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INTRODUCTION 


During  the  1960’s  and  1970' s,  environmental  contamination  in  gene¬ 
ral  and  water  pollution  specifically,  became  increasingly  important  as 
legal  and  scientific  problems.  Regulatory  agencies  needed  scientific 
data  to  support  the  notion  that  a  problem  existed  and  also  needed  fact¬ 
ual  information  for  establishing  tolerance  limits  for  levels  of  chemical 
discharges  into  surface  waters.  From  that  need  evolved  numerous  standard 
toxicity  tests,  including  a  test  to  determine  the  long  term  effects  of 
toxicants  on  a  representative  fish,  the  chronic  toxicity  test  with  fat¬ 
head  minnows. 

Aquatic  toxicologists  and  biologists  evolved  and  refined  an  effect¬ 
ive  fish  toxicity  test.  As  data  were  analyzed  and  experiments  designed, 
the  statistical  considerations  evolved  to  a  more  complex  level.  It  be¬ 
came  clear  that  some  of  the  statistical  procedures  needed  for  the  design 
of  toxicity  tests  and  for  the  analysis  of  chronic  toxicity  data  may  be 
novel  or  unique  and  should  be  developed  specifically  for  fish  toxicity 
tests. 

Operational  activities  of  the  U.S.  Army  (e.g.  munitions  manufacture) 
involve  the  production,  use,  and/or  discharge  of  a  variety  of  commercial 
chemicals.  Safety  data  must  be  provided  to  USEPA  concerning  surface 
water  contamination  due  to  discharges  of  chemical  intermediates  or  the 
final  product.  In-house  research  of  the  U.S. Army  with  the  standard  fish 
chronic  toxicity  test  highlighted  the  need  for  a  reexamination  of  the 
standard  procedures,  especially  regarding  statistical  techniques.  The 
main  goals  of  this  project  are  to  suggest  statistical  procedures  for 
analyzing  data  arising  from  such  toxicity  tests,  to  provide  recommen¬ 
dations  for  a  more  accurate,  reliable  standard  procedure,  and  to  faci¬ 
litate  research  in  aquatic  toxicology  in  general. 

This  project  was  initiated  as  an  interdisciplinary  investigation 
of  the  EPA  chronic  toxicity  test  for  fathead  minnows.  It  combined  the 
efforts  of  a  toxicologist,  a  fish  specialist,  and  statisticians.  The 
biologists  functioned  as  advisors  to  the  statisticians  in  regard  to  the 
characteristics  and  limitations  of  the  test  animal,  test  procedures,  and 
chronic  toxicity  data  and  evaluated  the  test  procedure  from  a  toxicolo¬ 
gical  viewpoint. 

The  statisticians  developed  procedures  for  data  storage,  transform¬ 
ation  and  analysis,  scrutinized  published  statistical  techniques  for 
their  applicability  to  fish  toxicity  data,  and  devised  new  statistical 
methods  for  analyzing  data  from  fish  toxicity  tests  when  they  were  felt 
to  be  more  applicable  than  the  standard  methods. 

This  final  report  is  the  synthesis  of  a  one  year  effort.  It  dis¬ 
cusses  both  biological  and  statistical  aspects  of  the  planning,  conduct, 
reporting,  and  data  analyses  associated  with  toxicity  tests  on  fathead 
minnows . 


Arguments  for  the  use  of  standardized  fish  stocks  and  standardized 
data  reporting  formats  are  presented.  Aspects  of  the  statistical  ana¬ 
lysis  of  toxicity  data  are  illustrated  with  examples  based  on  fish  toxi¬ 
city  studies.  Data  analysis  topics  discussed  include:  graphical  dis¬ 
plays,  preliminary  tests  of  tank  to  tank  heterogeneity  within  treatment 
groups,  preliminary  outlier  detection  tests,  adjustments  in  analysis 
procedures  due  to  tank  to  tank  heterogeneity,  overall  tests  of  hetero¬ 
geneity  in  response  rates  across  treatment  groups ,  treatment  group-control 
group  pairwise  multiple  comparison  procedures,  the  fitting  of  standard 
and  nonstandard  dose  response  curve  models,  analysis  of  variance  and 
multiple  regression  analyses  on  quantitative  responses,  statistical 
power  and  estimation  precision  to  be  expected  for  levels  of  sample  size 
and  suggestions  for  unequal  allocation  of  experimental  effort  across 
treatment  groups  with  greater  effort  expended  on  the  control  group  and 
lower  treatment  groups. 

It  is  hoped  that  the  results  obtained  in  this  study  will  contribute 
to  better,  more  reliable  toxicity  tests  and  data  analyses.  This  in 
turn  should  provide  improved  tools  for  the  regulation  of  toxic  chemicals 
in  aquatic  environments  and  should  suggest  fertile  areas  for  further 
study  and  development. 


I.  ASSEMBLE  AND  EVALUATE  INFORMATION  ON  TEST  METHODS 


Although  the  scope  of  work  specified  that  design  and  analysis  of 
chronic  toxicity  tests  would  be  researched,  it  became  clear  early  in  the 
project  that  many  aspects  of  statistical  analysis  could  be  pursued  using 
exemplary  data  from  early  life  stage  tests.  Consequently,  the  review 
of  test  methods  and  our  literature  search  included  both  chronic  life- 
cycle  tests  and  early  life  stage  tests. 

Three  literature  sets  were  searched  by  computer  using  appropriate 
key  words  (fathead  minnow,  toxicity,  chronic  tests,  etc.):  Mechanized 
Information  Center,  The  Ohio  State  University;  Oak  Ridge  National  Lab¬ 
oratory,  Oak  Ridge,  Tenn.;  Ohio  Environmental  Protection  Agency,  Columbus, 
Ohio.  The  latter  search  encompassed  13  major  data  base  searches  (Amer. 
Chem.  Soc.,  Biol.  Abstracts,  etc.).  The  hundreds  of  citations  received 
were  reviewed  for  relevancy  and  the  important  ones  were  abstracted  and 
filed.  Reprints  of  copies  or  articles,  64  in  all,  that  were  considered 
to  be  directly  related  to  future  tasks  were  assembled,  catalogued  and  re¬ 
viewed  in  detail. 

These  publications  provided  information  on  the  fathead  minnow  in 
regard  to  biology,  life  cycle  events,  duration  of  developmental  stages, 
nutritional  information  and  reproductive  characteristics.  The  papers 
on  test  methods  provided  details  of  variation  in  design  among  invest¬ 
igators  and  a  large  amount  of  experimental  toxicity  data  for  reference 
and  further  discussion. 

Papers  and  technical  reports  from  E.P. A. -Duluth  describing  the 
apparatus  [1]  and  procedure  [2]  for  chronic  toxicity  tests  were  reviewed 
and  studied  in  detail  in  order  to  understand  the  method  of  exposure, 
physical  arrangement  of  the  delivery  system  and  important  variations 
among  investigators,  e.g.  the  syringe  delivery  method  of  DeFoe  [3].  The 
literature  search  and  research  paper  perusal  was  essential  for  the  toxi¬ 
cology  group  to  authoritatively  Interpret  biological  factors,  experi¬ 
mental  data  or  test  methods  in  discussions  with  the  statistical  group 
or  to  suggest  limitations  in  design  due  to  the  animal  or  technique. 
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CATEGORIZING  DATA  SETS 


Early  in  the  project,  a  number  of  sets  of  experimental  data  from 
early  life  stage  toxicity  tests  'nd  chronic  life  cycle  toxicity  tests 
were  received  from  and  discussed  with  researchers  in  EPA-Duluth. 

Various  aspects  of  these  data  were  reviewed  with  the  statistical 
group  to  clarify  experimental  procedures  such  as  types  of  measurements 
and  how  they  were  acquired,  if  measurements  were  destructive  or 
nondestructive,  the  replication  of  experiments  and  variation  of  chemical 
concentration  in  the  delivery  system.  These  discussions  brought  out 
the  need  for  standardized  data  reporting  sheets  so  that  data  could  be 
accurately  categorized  and  recorded  in  a  systematic  manner  for  entry 
into  the  computer  and  statistical  analysis.  A  separate  section  on 
standardized  data  sheets  is  included  in  this  report. 

The  examples  in  this  report  are  based  on  some  of  these  experimental 
data  sets.  In  particular  the  data  sets  from  early  life  stage  toxicity 
tests  by 

Benoit  -  compound  A 

DeFoe  -  compound  C 

Holcombe  and  Phipps  -  compound  D 

Jarvinen  -  compound  B 


are  used.  These  data  sets  are  listed  in  Appendix  All. 


III.  ORGANIZATION  OF  DATA:  STANDARDIZED  DATA  RECORDS 

The  early  life  stage  data  and  chronic  life  cycle  toxicity  data  sets 
were  supplied  by  six  different  investigators.  Each  set  of  chronic  data 
was  received  in  a  unique  format.  Each  set  of  data  was  reviewed  for  the 
experimental  procedure  (if  available)  in  order  to  accurately  categorize 
the  data  for  storage  on  computer  and  subsequent  statistical  analysis. 
Routine  questions,  e.g.  how  many  days  of  exposure,  and  more  complex 
questions,  e.g.  are  replicate  tests  genuine  replications,  were  not  easily 
resolved  by  a  review  of  the  data  sheets,  nor  was  the  comparability  of 
the  same  categories  of  data  in  similar  experiments  among  the  array  of 
investigators.  Standardized  data  records  have  merit  if  they  are 
sufficiently  versatile  to  meet  most  needs,  clearly  summarize  the  exposure 
conditions  and  facilitate  transformation,  computer  storage,  and 
statistical  analysis  of  raw  data.  The  latter  task  is  often  done  by  an 
individual  who  is  not  an  expert  in  biological  research  and  unfamiliar 
with  operational  details  of  toxicity  tests. 

Good  laboratory  practice  regulations  (GLP)  have  been  adopted  by 
FDA  for  nonclinical  laboratory  studies  [4]  to  assure  the  quality  of 
data  in  support  of  product  safety  decisions.  One  component  of  the  GLP 
deals  with  specific  record-keeping  practices  for  experimental  data. 

The  advantages  of  these  required  record-keeping  practices  have  been 
discussed  in  regard  to  vertebrate  experiments  [5]  and  would  apply 
equally  as  well  to  fish  toxicity  data  to  the  benefit  of  investigators, 
statisticians,  and  regulatory  agencies. 

Although  there  is  some  variation  in  the  design  of  fish  toxicity 
tests,  certain  features  are  almost  universal.  For  example,  in  a  chronic 
toxicity  test,  a  flow-through  apparatus  is  always  used  and  standard 
measurements  include  hatchability  of  embryos,  fish  length  and  weight, 
survival  (mortality),  and  spawning  data.  Consequently,  standardized 
recording  sheets  could  be  devised  for  summarizing  exposure  methods  and 
experimental  data. 

Our  standardized  reporting  sheets  have  two  components:  (a)  a 
descriptive  section  summarizing  the  conditions  of  the  experiment  with 
code  words  or  letters  to  categorize  or  define  data  for  the  statistician, 
.and  (b)  the  raw  data  record  sheet  with  no  calculations  or  transforma¬ 
tions.  These  record  sheets  have  been  designed  for  data  of  early  life 
stage  tests  or  chronic  life  cycle  tests. 

A.  Composite  of  Experimental  Conditions 

1.  Investigator _ 

2.  Toxicant  _ Source _ and _ %  purity 

3.  Starting  Date  of  Test  (Day  Zero) :  /  /  . 

D  M  Y 
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4.  Selection  of  embryos: _ Embryos  selected  randomly. 

_ _ Embryos  examined,  only  viable 

eggs  incubated. 

_ Other  (specify) . 

5.  Are  embryos  from  paired  matings?  Yes  or  no. 

6.  Fish  I.D.:  for  use  only  with  paired  matings;  use  a  unique  ident¬ 
ifier  here. 

7.  Generation  of  embryos  or  fish?  Zero  or  first? 

(Note:  In  chronic  tests,  some  investigators  refer  to  spawnings  of 
exposed  adults  as  first  generation  embryos,  others,  second  generation. 
We  define  zero  generation  as  any  stage  or  form  used  to  start  a  test 
and  any  stages  during  that  generation,  including  adults.  First 
generation  is  any  of  the  stages  following  zero  generation  adults. 

One  may  argue  against  this  system  on  a  biological  basis,  but  it  dis¬ 
tinguishes  between  the  same  stage  of  separate  generations.  With 
this  system  there  is  no  second  generation  in  a  standard  chronic  life 
cycle  test.) 

8.  Nominal  Concentration:  identify  tanks  by  nominal  concentration 
of  toxicant  (mg/1,  ug/1) ,  "Solvent  control"  by  "S",  "water  only 
control  by  "W". 

9.  Identify  replicate  tanks  in  Nominal  Concentration  column  by  "REP" 

e.g.  0.25  REP;  identify  equivalent  tanks  by  "EQ",  e.g.  0.25  EQ. 

(Note:  It  should  always  be  clear  in  original  data  sheets  the  re¬ 
lation  of  replicate  tanks  of  the  same  nominal  concentration,  a 
crucial  factor  in  deciding  what  statistical  procedures  to  use.  We 
define  replicate  as  the  simultaneous  exposure  of  fish  to  similar 
concentrations  of  a  chemical  that  are  delivered  independently,  i.e. 
two  tanks  containing  nominal  concentrations  that  originate  from 
separate  syringes  in  the  delivery  system  are  replicates.  We  define 
equivalent  as  the  simultaneous  exposure  of  fish  to  the  same  concent¬ 
ration  of  a  chemical  that  is  delivered  from  a  common  origin,  i.e. 
groups  of  fish  in  several  screened  compartments  of  the  same  tank  are 
equivalent  groups,  as  are  fish  in  different  tanks  supplied  equally 
by  tubing  that  is  split  after  the  final  dilution.  Replicate  ("re¬ 
peated  experiment")  does  not  distort  the  conventional  meaning  of  that 
term.  The  choice  of  equivalent  ("equal  in  quantity")  was  the  best 
approximation  of  what  occurs. 

L0.  Tank  I.D.:  identify  multiple  tanks  of  the  same  nominal  concent¬ 
ration  and  type  (REP  or  EQ)  by  capital  letters,  e.g.  Rep-A, 

Rep-B,  etc. 


11.  Identify  simultaneous  incubations  (same  day,  same  tank)  of  embryos 
from  different  spawnings  by  adding  "x"  to  the  "No.  of  days  since 
day  zero"  entry,  e.g.  32x,  32x  ;  for  simultaneous  incubations  of 
eggs  from  the  same  spawning,  add  a  "y",  e.g.  32y,  32y. 

12.  Embryo  cup  I.D.:  identify  multiple  embryo  cups  in  the  same  tank 
by  Tank  I.D.  and  number,  e.g.  A-l,  A-2. 

13.  Identify  multiple  spawnings  on  the  same  date  by  adding  a  lower 
case  letter  to  the  entry,  "No.  of  days  since  day  zero",  e.g. 

32a,  32b. 

14.  Initial  exposure  (day  zero)  as:  embryos/fry/ juveniles  (circle  one) 

15.  Data  entries  on  one  line  are/are  not  from  the  same  fish. 

Data  Sheets  for  Separate  Categories  of  Data 

1.  Survival  Data 

a.  Experimental  conditions:  use  entries  1,  2,  3,  7,  8,  9,  10, 
and  14  from  part  A  to  summarize  the  conditions  of  the  exp¬ 
eriment. 

b.  Data  Sheet 


Investigator _  Toxicant _  %  purity 

Starting  date  of  Test  (day  zero) :  /  / 


Toxicant 


%  purity 


Measurements  taken _  days  after  first  exposure  of  this  stage. 


Starting  Date  of  Test  (day  zero) :  /  / 

D  M  Y 


Nominal  Tank  Length  (mm)  or  Weight  (mg)  of  Individual  Fry 

Cone.  I.D.  1  2  3  4  5  6  7  8  9  10  11  12  13  14 


3.  Hatchability  of  embryos 

a.  Experimental  conditions:  Use  entries  1,  2,  3,  4,  5,  6,  7, 
8,  9,  10,  11  and  12  from  Part  A  to  summarize  the  conditions 
of  the  experiment. 

b.  Data  Sheet 

Investigator  _ _ _ 


Toxicant 


%  purity 


Starting  date  of  Test  (day  zero) :  /  / 

D  M  Y 
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Embryo  Cup  Fish  No. days  No. embryos  Cum.no.  Cum.  no 
Nominal  I.D.  I.D.  since  at  start  hatched  unhatched 
Cone . _ Tank  Cup _ day  zero _ 


Cum.  no. 

unaccounted 

for 


4.  Spawning  Data 

a.  Experimental  conditions:  use  entries  1,  2,  3,  5,  6,  8,  10,  and 
13  from  Part  A  to  summarize  the  conditions  of  the  experiment. 

b.  Data  Sheet 


Investigator 


Toxicant  _  %  purity  _ 

Starting  date  of  Test  (day  zero):  /  / 

D  M  Y 

Nominal  Tank  Fish  No.  days  No.  of  No. of  Estimated  Embryos  used 

cone.  I.D.  I.D.  since  spawnings  embryos  conditions  subsequently? 

_  day  zero  of  embryos  Yes,  No.  Where? 


a.  Experimental  conditions:  Use  entries  1,  2,  3,  6,  8,  9 
10,  14,  and  15  from  Part  A  to  summarize  the  conditions 
of  the  experiment. 

b.  Data  Sheet 


Investigator 


Toxicant 


%  purity 


Starting  date  of  test  (day  zero) :  /  / 

D  M  Y 


Nominal  Tank  Fish  No. days  since  Length  Weight  Sex 
cone .  I . D .  I . D . _ day  zero _ (mm) _ (mg)  (m,  f,  1m) 


6.  A  data  report  sheet  for  chemical  analysis  of  water  should  be 
provided  if  detailed  instructions  for  chemical  analysis  are 
included  in  a  revised  procedure. 

7.  A  separate  sheet  that  need  not  be  standardized  should  be 
attached  to  the  data  records  summarizing  important  conditions 
i.e.  ph,  temperature,  photo  period,  flow  rates,  type  of  food 
and  feeding  schedule,  etc.  and  any  limits  of  conditions  that 
vary  during  the  test. 


Transfer  of  Experimental  Data  to  Standardized  Data  Sheets. 


IV.  ANALYSIS  OF  AND  COMMENTS  ON  THE  TEST  PROCEDURE 


The  published  procedure  for  chronic  toxicity  tests  [2]  has  not 
been  revised  since  1972.  Since  that  time  considerable  research  on 
the  test  per  se  has  been  done  at  EPA  -  Duluth  and  elsewhere  to  improve 
reliability,  reproducibility  and  accuracy.  For  example,  some  condi¬ 
tions  specified  in  the  1972  procedure  may  be  replaced  by  improved 
techniques,  e.g.  handling  and  selection  of  embryos,  use  of  paired 
spawnings,  etc.  Those  changes  that  could  be  incorporated  as  improve¬ 
ments  in  the  test  procedure  should  be  made  and  a  revised  version 
published. 

Following  are  comments  about  specific  sections  of  the  procedure, 
using  the  number  and  letter  designations  of  the  procedure  [2]  as  a 
reference. 

A.  Physical  System 

4.  Flow  Rate 

Recent  USEPA  regulations  [7]  designate  certain  chemicals  as 
hazardous  wastes  if  and  when  they  are  discarded.  Guidelines  and 
recommendations  on  the  treatment  (clean  up)  of  experimental  tank 
effluent  should  be  included  for  a  test  system  containing  poten¬ 
tially  toxic  chemicals  in  ten  or  more  tanks ,  changing  6  to  10 
tank  volumes/24  hours  in  each  tank,  all  operating  continuously 
for  months. 

14.  Where  surface  water  or  municipal  water  is  used,  a  filter 
system  should  be  considered. 

B.  Chemical  System 

2.  Measurement  of  toxicant  concentration  and 

5.  Methods 

A  much  more  detailed  procedure  should  be  incorporated  in  this 
section  in  conjunction  with  a  carefully  formulated  standardized 
reporting  sheet.  The  essence  of  this  suggestion  resides  in  the 
absolute  need  to  know  the  limits  of  chemical  concentration  changes 
and  to  have  assurance  that  the  chemical  analysis  data  are  reliable. 


V.  EARLY  LIFE  STAGE  TOXICITY  TESTS 


A.  Background  In  recent  years  there  has  been  some  movement  in  the 
direction  of  developing  toxicity  tests  that  provide  much  of  the  in¬ 
formation  relating  to  chronic  and  sublethal  toxicant  effects  that  is 
obtainable  from  full  life  cycle  tests,  yet  which  require  far  fewer 
resources  of  time,  space,  cost  and  which  are  simpler  to  carry  out 
and  analyze.  To  accomplish  these  aims,  the  use  of  early  life  stage 
toxicity  tests  has  become  more  common.  For  fathead  minnows  such 
early  life  stage  tests  require  about  thirty  days  of  effort  as  compared 
with  250  to  300  days  for  full  life  cycle  tests.  This  permits  a  great 
many  more  compounds  to  be  tested. 

A  number  of  guidelines  for  conducting  early  life  stage  tests  in 
a  standardized  manner  have  been  proposed  [8,  9,  10],  In  these  tests, 
organisms  are  exposed  during  part  of  the  embryonic  stage,  throughout 

the  larval  stage,  and  during  part  of  the  juvenile  stage.  The  ration¬ 

ale  is  that  this  represents  the  period  of  greatest  sensitivity  of 
the  fish,  and  so  chronic  and  sublethal  toxicant  effects  will  be  re¬ 
vealed  . 

In  one  version  of  the  test,  groups  of  recently  fertilized  fish 
embryos  are  placed  in  embryo  cups  within  test  chambers.  There  are 
generally  five  or  more  toxicant  groups  and  one  or  more  control  groups. 
Each  (treatment  or  control)  group  consists  of  two  or  more  replicate 
test  chambers.  The  embryos  are  kept  on  test  until  they  hatch  (about 
5  to  7  days) ,  at  which  point  the  live,  normal  larvae  are  thinned  to 
the  desired  number  per  tank  and  these  are  kept  on  test  for  about  an 

additional  four  weeks,  at  which  point  the  test  is  terminated.  In  a 

variant  on  this?  approach,  the  embryos  are  thinned  after  just  two  days 
on  test.  After  hatch,  all  of  the  live  larvae  are  released  into  the 
test  chambers  for  the  rest  of  the  test.  This  avoids  handling  the 
newly  hatched  larvae  at  a  time  when  they  are  most  sensitive  to  the 
toxicant. 

B.  Data  The  data  recorded  in  such  early  life  stage  tests  include 
number  of  embryos  per  embryo  cup,  number  of  embryos  hatched  live  and 
hatched  normal,  number  of  fry  in  each  embryo  cup  after  thinning,  numbei 
of  fry  live  at  end  of  test  and  number  normal,  individual  weights  of 
all  fry  alive  at  end  of  test,  and  periodic  toxicant  concentration 
measurements  within  each  tank. 

Standardized  data  reporting  sheets  that  facilitate  the  interpre¬ 
tation  of  test  results  and  the  communication  of  these  results  among 
investigators  and  laboratories  have  been  developed  by  investigators 
at  USEPA  -  Duluth.  They  have  been  kind  enough  to  supply  us  with  such 
sheets  from  about  twenty  early  life  stage  tests  (personal  communica¬ 
tion).  Figure  V.  1  illustrates  such  a  basic  data  reporting  sheet 
based  on  the  test  of  compound  C  carried  out  by  DeFoe. 
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Page  1  contains  embryo  and  fry  survival  and  normality  data  and  a 
diagram  showing  the  test  layout.  We  see  that  in  this  test  there  was 
a  single  control  group  (1) ,  five  treatment  groups  (2  to  6) ,  two  test 
chambers  per  group,  and  a  single  embryo  cup  per  test  chamber.  Page  2 
contains  individual  weight  measurements  on  all  the  fry  that  survived 
the  test.  Page  3  contains  the  results  of  the  individual  toxicant 
concentration  measurements  made  in  each  chamber  periodically  through¬ 
out  the  test.  We  have  found  these  data  reporting  sheets  to  be  very 
easy  to  understand  and  very  useful. 

In  order  to  work  with  the  data  it  was  necessary  to  put  them 
into  computer  readable  form.  The  approach  that  we  took  to 
accomplish  this  is  illustrated  in  Figure  V.2  for  the  data  from 
the  test  of  compound  C  by  DeFoe.  The  three  types  of  data  —  survival, 
weight,  and  toxicant  concentration —  are  represented  in  three  "card 
types."  The  data  for  each  "card  type"  are  listed  in  Figure  V.2. 

Some  applications  call  for  use  of  just  one  card  type  while  others 
call  for  use  of  two  or  more  card  types.  The  first  six  entries  on 
each  card  are  the  same  across  card  types  —  treatment  group  (col  2) , 
replicate  designation  (col  4),  card  type  (col  6),  card  member  (cols 
7-8),  investigator  code  (cols  9-10),  test  code  (cols  11-12).  This 
provides  enough  information  to  sort  the  cards  by  investigator,  ex¬ 
periment,  type,  group,  and  sequence  should  the  data  become  disar¬ 
ranged.  Card  type  1  (survival  data)  contains  in  addition  number  of 
embryos  tested  (cols  16-20),  number  hatched  live  (cols  21-25),  number 
of  fry  tested  (cols  31-35),  number  live  at  end  of  test  (cols  36-40), 
number  normal  at  end  of  test  (cols  41-45).  Card  type  2  (weight  data) 
contains  number  of  weights  recorded  from  that  particular  chamber 
(cols  14-15),  individual  weights  (5  cols  per  weight,  up  to  13  weights 
per  card).  Card  type  3  (toxicant  concentration)  contains  month 
(cols  16-17),  day  (cols  18-19),  year  (cols  20-21),  toxicant  concen¬ 
tration  (cols  32-38)  —  one  determination  per  card.  At  the  head  of 
each  type  of  information  several  lines  of  descriptive  text  are  given. 
This  text  is  informative  when  the  data  are  printed  out  but  is  skip¬ 
ped  over  for  purposes  of  analysis. 

We  have  found  this  data  organization  to  be  easy  to  prepare, 
easy  to  maintain,  and  easy  to  use.  Such  data  files  represent  the 
"basic  data"  for  all  subsequent  analyses  discussed  in  this  report. 
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VI.  PRELIMINARY  GRAPHICAL  DATA  DISPLAYS 


Graphing  the  data  is  generally  considered  to  be  a  good  first 
step  in  analyzing  data.  Graphs  provide  insights  into  the  structure 
of  the  data  and  reveal  the  presence  of  possibly  unanticipated  rela¬ 
tions  or  anomolies  in  the  data. 

Figures  VI.  1  and  VI. 8  illustrate  the  kinds  of  information 
that  can  be  obtained  from  preliminary  plots.  They  illustrate 
percentage  embryo  and  fry  mortality  and  abnormality  observed  in 
early  life  stage  tests  on  fathead  minnows  conducted  by  DeFoe 
with  compound  C  and  by  Holcombe  and  Phipps  with  compound  D.  The 
tests  each  consist  of  a  control  group  (1)  and  five  treatment  groups 
(2  -  6)  with  toxicant  levels  in  roughly  geometric  progression.  The 
DeFoe  test  was  run  with  four  chambers  per  group.  The  plotting  symbol 
"A"  represents  a  single  response,  "B"  represents  two  coincident  re¬ 
sponses,  etc. 

Figures  VI. 1  and  VI. 5  reveal  no  trends  in  embryo  mortality  with 
increasing  toxicant  level  in  either  test.  Tank  to  tank  variation 
within  treatment  groups  appears  to  be  approximately  constant  across 
groups  except  for  a  single  tank  in  Group  2  (Figure  VI. 1)  which  has 
about  50  percent  greater  embryo  mortality  than  all  the  other  tanks 
in  the  test.  It  appears  to  be  an  outlier ,  i.e.  its  response  does 
not  seem  to  conform  to  the  pattern  of  the  bulk  of  the  data. 

Figures  VI. 2  and  VI. 6  show  increasing  trends  in  fry  mortality 
with  toxicant  concentration  in  each  test.  This  pattern  is  to  be 
expected  since  the  larvae  are  most  toxicant  sensitive  shortly  after 
hatching.  In  each  test  tank  to  tank  variation  within  groups  is 
greatest  in  the  middle  and  least  at  the  ends,  in  conformance  with 
binomial  theory.  No  outlying  tanks  are  evident  with  respect  to  fry 
mortality.  Note  that  in  both  tests  the  highest  treatment  groups 
experience  ,100  percent  fry  mortality. 

Figures  VI. 3  and  VI. 7  exhibit  embryo  abnormality  in  the  two 
tests.  They  are  strikingly  similar.  In  the  control  groups  and  the 
four  lowest  concentration  groups  there  is  little  or  no  abnormality 
among  newly  hatched  live  larvae.  However  in  the  highest  concent¬ 
ration  groups  there  is  100  percent  abnormality  among  newly  hatched 
live  larvae.  It  thus  appears  that  very  high  concentrations  of  each 
of  these  toxicants  will  penetrate  the  embryo. 

Figures  VI. 4  and  VI. 8  exhibit  fry  abnormality  in  the  two  tests. 
In  brief,  there  is  none.  After  32  days  the  fry  have  either  died  or 
are  normal.  Recall  that  the  highest  toxicant  groups  experience  100 
percent  mortality  and  so  there  is  no  abnormality  data  to  plot. 
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Figure  VI. 1  DeFoe  compound  C  percentage  embryo  mortality  by  treatment  group 
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Figure  VI. 2  DeFoe  compound  C  percentage  fry  mortality  by  treatment  group 


1KT 

Figure  VI. 3  DeFoe  compound  C  percentage  embryo  abnormality  by  treatment  group 
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Figure  VI. 5  Holcombe  and  Phipps  compound  D  embryo  mortality  by  treatment  group 
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Figure  VI. 6  Holcombe  and  Phipps  compound  D  fry  mortality  by  treatment  group 
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Figure  VI. 7  Holcombe  and  Phipps  compound  D  embryo  abnormality  by  treatment  group 


Figure  VI. 8  Holcombe  and  Phipps  compound  D  fry  abnormality  by  treatment  group 
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VII.  TESTING  FOR  TANK  TO  TANK  HETEROGENEITY  WITHIN  TREATMENT  GROUPS 

A.  Background.  In  order  to  assess  variability  of  response,  toxicity 
tests  generally  include  several  fish  tanks,  usually  two  to  four, 
within  each  treatment  or  control  group.  EPA  guidelines  for  early 
life  stage  and  full  life  cycle  toxicity  tests  with  fathead  minnow 
[2,  8]  call  for  at  least  two  replicate  test  chambers  for  each 
treatment  group.  Some  tests  use  more. 

An  important  preliminary  inference  of  interest  in  toxicology 
data  analyses  is  to  determine  if  there  is  any  statistical  evi¬ 
dence  of  variation  in  response  among  tanks  within  treatment  groups. 
Such  variation  might  be  due  to  differences  in  location  or  handling 
of  individual  tanks ,  to  fungus  or  illnesses  that  might  invade  a 
tank,  to  unforseen  accidents  during  the  test,  etc. 

If  evidence  of  tank  to  tank  heterogeneity  exists  then  analyses 
should  be  carried  out  on  a  per  tank  basis.  If  no  evidence  of  tank 
heterogeneity  exists  then  data  might  be  pooled  across  tanks  within 
groups  and  analyses  carried  out  on  a  per  fish  basis,  ignoring  the 
replicate  tanks.  For  example,  mortality  rates  could  be  compared 
based  on  binomial  theory.  Such  per  fish  analyses  would  provide 
many  more  degrees  of  freedom  to  estimate  random  error  than  would 
per  tank  analyses  and  so  are  more  sensitive.  For  example  if  there 
are  four  tanks  per  group  and  25  fish  per  tank  then  a  per  fish 
analysis  might  be  based  on  99  degrees  of  freedom  per  group  where¬ 
as  a  per  tank  analysis  would  be  based  on  just  three  degrees  of 
freedom  per  group. 

However  the  validity  of  per  fish  analyses  rests  on  the  absence 
of  tank  to  tank  heterogeneity.  If  there  is  in  fact  variation  in 
response  rate  across  tanks  within  treatment  groups  then  variabi¬ 
lity  estimates  based  on  per  fish  analyses  will  underestimate  the 
true  variability  of  the  estimates  and  test  statistics.  This 
will  result  in  standard  error  estimates  that  are  too  small,  con¬ 
fidence  intervals  that  are  too  short,  and  hypothesis  testing 
procedures  that  falsely  reject  the  null  hypothesis  more  often  than 
their  nominal  rates  (i.e.  inflated  alpha  levels).  It  is  thus 
important  to  test  for  the  presence  of  tank  to  tank  heterogeneity 
within  treatment  groups  before  proceeding  on  to  the  analysis 
of  primary  interest. 

B.  Remarks  on  Some  "Standard"  Procedures. 

Finney  [11],  section  9.1,  pp  175  ff.  suggests  the  following 
procedure  for  testing  tank  to  tank  heterogeneity.  Fit  a  probit 
curve  to  the  data  based  on  pooled  data  across  tanks  within  groups. 
Fit  a  probit  curve  to  the  data  using  the  individual  tanks  within 
groups.  The  point  estimates  of  the  two  probit  fits  will  be 
exactly  the  same.  However  the  residual  chi  squares  and  their 
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respective  degrees  of  freedom  will  differ.  The  differences 
between  these  two  residual  chi  squares  can  be  interpreted  as 
the  chi  square  for  heterogeneity  among  tanks  within  treatment 
groups.  Similar  considerations  hold  for  the  usual  chi  square 
test  for  homogeneity. 

We  have  carried  out  this  procedure  using  the  probit  fit  and 
using  the  usual  chi  square  test  for  homogeneity.  We  compare  the 
results  of  these  two  tests.  The  theoretical  bases  of  these 
heterogeneity  tests  are  discussed  in  Appendix  AVII. 

We  illustrate  these  two  heterogeneity  test  procedures  on  the 
fry  mortality  data.  First  consider  the  test  of  compound  C  by 
DeFoe. 

Probit  Fit 

We  fitted  the  probit  model  to  the  treatment  groups  using  the 
(natural)  logarithm  of  concentration  and  excluding  the  control 
group.  (Note  that  the  same  fit  was  obtained  when  the  control 
group  was  included).  The  probit  fits  were  carried  out  using  the 
PROC  PROBIT  procedure  in  the  SAS  statistical  computing  system 
[12].  The  data  consist  of  1=5  treatment  groups,  J=2  tanks  per 
group.  Thus  there  are  10  responses  to  which  we  fit  the  two  pa¬ 
rameter  probit  model. 

PA  =  $(a-5  +  glogCp  i  =  2 . 6 

Figures  VII. 1,  VII. 2  contain  the  results  of  the  probit  fits  to 
the  individual  responses  and  to  the  responses  pooled  across 
tanks  within  groups  respectively.  The  analysis  of  variance  ta¬ 
ble,  as  suggested  by  Finney,  appears  in  the  bottom  portion  of 
Figure  VII. 1.  The  upper  0.005  point  of  the  chi  square  distrib¬ 
ution  with  5d.f.  is  16.75.  Thus  this  test  for  tank  to  tank  het¬ 
erogeneity  within  groups  is  "highly  statistically  significant". 
At  face  value  this  suggests  strong  statistical  evidence  of  vari¬ 
ation  in  response  rate  across  tanks  within  treatment  groups. 

Chi  Square  Fit 


We  now  carry  out  a  chi  square  test  of  heterogeneity  in  re¬ 
sponse  rates  across  tanks  within  groups  based  on  the  usual  chi 
square  test  of  homogeneity  across  groups.  Figures  VII. 3,  VII. 4 
contain  the  results  of  the  chi  square  tests  based  on  the  individ¬ 
ual  responses  and  on  responses  pooled  across  tanks  within  groups 
respectively.  Control  group  responses  are  included  in  these 
tests.  The  tests  were  carried  out  using  the  PROC  FREQ  procedure 
in  the  SAS  statistical  computing  system  [  12  ] .  The  analysis  of 
variance  table  suggested  by  Finney  appears  below. 


Source 


d.  f . 


ss 

1  .8  J  .  7  83 
n.HS» 

Lack  of  fit  of  individual  tanks 

about  model  (homogeneity)  11  183.675 

Thus  the  heterogeneity  chi  square  is  very  small.  Thus  there 

is  no  statistical  evidence  of  variation  among  tanks  within  treat¬ 

ment  groups. 

Note  that  the  conclusions  arrived  at  from  this  heterogeneity 
chi  square  test  are  in  direct  contradiction  to  those  arrived  at 
from  the  heterogeneity  chi  square  test  based  on  the  probit  fit. 
What  is  the  cause  of  the  discrepancy? 

There  are  two  possible  sources  of  difficulties.  The  first 
concerns  the  probability  estimates  in  the  denominators  of  the 
test  statistics.  In  the  probit  based  statistic  the  i-th  group 
response  rate  in  the  denominator  is  estiamted  as 

A  A  A  /\ 

pi  =  $i  =  $(ot  -  5  +  BlogC*)  whereas  in  the  homogeneity  chi  square 

based  statistic  the  i-th  group  response  rate  in  the  denominator 
is  estimated  as  =  P  E  X++/N++  for  all  i.  The  assumption  of 
constant  P  values  in  the  denominator  is  clearly  not  justified. 

The  assumption  of  P^  values  based  on  the  probit  model  is  also 
not  good,  as  can  be  seen  from  the  very  large  residual  "chi 
square"  value  in  Table  VII. 2.  Thus  the  substantial  differences 
in  the  response  rate  estimates  that  appear  in  the  denominators 
of  the  two  statistics,  along  with  the  probable  inadequacies  of 
both  sets  of  estimates,  may  account  for  at  least  a  portion  of 
the  discrepancy  in  chi  square  values . 

The  second  possible  source  of  discrepancy  is  based  on  the 
validity  of  the  chi  square  assumption  itself.  The  validity  of 
the  asymptotic  chi  square  theory  is  dependent  on  the  cell  ex¬ 
pected  frequencies  being  large  enough.  In  particular  if  any 
responses  are  observed  in  cells  with  very  small  expected  fre¬ 
quencies  then  very  large  cell  chi  squares  can  result  which  can 
greatly  inflate  the  statistic. 

Consider  the  two  chi  square  statistics  for  lack  of  fit  from 
the  probit  model  —  one  for  individual  tanks  and  one  for  pooled 
tanks.  We  break  out  the  individual  components  of  these  statis¬ 
tics. 


Lack  of  fit  of  pooled  tanks 
about  model  (homogeneity) 
Variation  of  individual  tanks 

w/i  tmnt  groups  (by  subtraction) 
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Grp 

Tar 

Chi  square  statistic 

for 

lack  of  fit  to  probit 

model 

2 

X 

Mean 

ik  Cone 

<v 

Dead 

Live 

Total 

A 

Pi 

A 

N  P 
ij  i 

A  A 

N. .P.Q. 
il  Vi 

2 

A 

1.991 

0 

20 

20 

0.000015 

0.000301 

0.000301 

0.000301 

2 

B 

1.991 

0 

20 

20 

0.000301 

3 

A 

5.976 

0 

20 

20 

0.002542 

0.050830 

0.050701 

0.05096 

3 

B 

5.976 

2 

18 

20 

<J4.9342> 

4 

A 

14.812 

0 

21 

21 

0.0475 

0.9975 

0.9501 

1.0473 

4 

B 

14.812 

1 

19 

20 

0.95 

0.9409 

0.00276 

5 

A 

48.307 

4 

16 

20 

0.4227 

8.454 

6 . 8805 

4.0648 

5 

B 

48.307 

5 

15 

20 

2.4444 

6 

A 

146.984 

20 

0 

20 

0.88356 

17.6713 

2.05764 

2.6355 

6 

B 

146.984 

20 

0 

20 

2.6355 

87.8165 


This  compares  with  X  =  87.7666  calculated  from  SAS  PROC  PROBIT 


Table  VII. 2  Probit 

fit  to 

groups 

2,  3.  4, 

5,  6  —  tanks  pooled 

within  group: 

Grp 

Chi 

square 

statistic  for  lacK.  of 

fit  to  the 

probit  model 

Mean 

Cone 

<V 

Dead 

Live 

<V 

Total 

A. 

Pi 

VO 

A  A 

N.  P.Q. 

i+  il 

2 

X 

2 

1.991 

0 

40 

40 

0.000015 

0.000602 

0.000602 

0.000602 

3 

5.976 

2 

38 

40 

0.002542 

0. 10166 

0.10142  C35753T> 

4 

14.812 

1 

40 

41 

0.0475 

1.9475 

1.855 

0.484 

5 

48.307 

9 

31 

40 

0.4227 

16.908 

9.761 

6.407 

6 

146.984 

40 

0 

40 

0. 88356 

35.3426 

4.11528 

5.271 

47.695 

This 

2 

compares  with  X 

47.6775  calculated  by  SAS  PROC 

PROBIT 

Comparison  of  these  two  chi  square  values  clearly  shows  the 
source  of  the  "significant"  chi  square  for  heterogeneity.  Namely 
the  tanks  from  group  3  have  very  small  expected  frequencies  (NP)  yet 
have  observed  responses.  Thus  these  component  chi  square  values 
are  large  and  dominate  the  overall  chi  square  values. 

If  we  remove  the  group  3  values  from  the  chi  square  statistics 
we  have : 


2 

Separate  tanks:  x  =  87.8165  -  0.05096  -  74.9347  =  12.831 

Pooled  tanks:  x2  =  47.686  -  35.532  =  12.154 

The  relation  between  these  two  chi  square  statistics  is  then 
just  like  that  of  the  chi  square  tests  resulting  from  thp  con¬ 
tingency  table  tests. 

Moral:  Uncritical  use  of  the  chi  square  test  for  homogeneity  of 
tanks  within  concentration  groups  recommended  by  Finney 
can  lead  to  completely  incorrect  results  and  results  con¬ 
tradictory  to  those  of  other  homogeneity  tests  because 
of : 

•  small  expected  frequencies  within  cells 

•  response  rate  estimates  based  on  particular  (pos¬ 

sibly  inappropriate)  model  fitted 

We  repeated  the  same  calculations  on  the  fry  mortality  data 
in  the  test  of  compound  D  by  Holcombe  and  Phipps. 

Probit  Fit 


Using  all  six  groups,  a  logarithmic  transformation  of  con¬ 
centration,  and  Abbott’s  correction  for  background  response  we 
obtain: 


Source 

d.f. 

S.S. 

Lack  of  fit  of  pooled  tanks 
about  probit  model 

3 

0.5064 

Variation  of  individual  tanks 
w/i  tmnt  groups  (by  subtr) 

18 

21.7906 

Lack  of  fit  of  individual  tanks 

21 

22.2952 

about  probit  model 


The  value  21.7906  is  at  the  upper  24  percent  point  of  a  chi  sq¬ 
uare  distribution  with  18  d.f.  and  so  is  nonsignificant. 

Chi  Square  Fit 

There  are  1=6  groups,  J  =  4  tanks  per  group.  We  carry  out 
chi  square  tests  of  heterogeneity  in  response  rates  across  tanks 
within  groups  based  on  the  usual  chi  square  test  of  homogeneity 
across  groups.  We  obtain: 
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5 


389.676 


Source _  < 

Lack  of  fit  of  pooled  tanks 
about  model  (homogeneity) 

Variation  of  individual  tanks  18  10.198 

w/i  tmnt  groups  (by  subtr) _ 

Lack  of  fit  of  individual  tanks  23  399.874 

about  model  (homogeneity) 

The  heterogeneity  chi  square  is  again  small.  There  is  no 
statistical  evidence  of  variation  among  tanks  within  treatment 
groups.  Note  however  that  there  is  strong  statistical  evidence 
of  variation  in  response  rate  from  group  to  group,  as  would  be 
expected.  Thus  the  weights  in  the  denominator  of  the  heteroge¬ 
neity  chi  square  statistic  are  suspect. 

Separate  Heterogeneity  Tests  Within  Treatment  Groups 

We  have  seen  for  DeFoe's  fry  mortality  data  that  we  can  obtain 
diametrically  opposite  conclusions  about  heterogeneity  of  response 
within  treatment  groups  depending  on  whether  the  test  for  homo¬ 
geneity  was  based  on  a  probit  model  fit  or  on  a  contingency  table 
fit.  This  was  attributed  to 

1.  differences  in  the  weights  used  in  the  denominators  of 
the  chi  square  statistics  (based  on  the  assumed  model) 

2.  small  expected  frequencies  within  cells  that  invalidate 
asymptotic  distribution  theory. 


To  account  for  problem  (1) ,  we  carry  out  separate  chi  square 
heterogeneity  tests  within  each  concentration  group,  without 
imposing  any  structure  on  the  form  of  the  concentration-response 
relation.  We  do  this  by  carrying  out  separate  chi  square  tests 
within  each  group  and  then  pooling  the  results  across  groups. 

There  is  however,  a  technical  problem  associated  with  this 
approach.  For  many  (if  not  most)  of  the  responses  of  interest 
the  probabilities  of  occurrence  are  fairly  close  to  0  or  1.  The¬ 
refore  the  expected  frequencies  of  occurrence  can  be  rather  small, 
thus  invalidating  the  use  of  asymptotic  chi  square  theory.  We 
illustrate  this  phenomenon  with  the  Holcombe  and  Phipps  fry  mor¬ 
tality  data,  broken  down  by  group.  The  output  (from  SAS  PROC 
FREQ)  is  shown  in  Figures  VII. 5  to  VII. 10 

We  see  that  groups  1,  2,  3,  have  small  expected  numbers  of 
dead  fry  (less  equal  to  2.0).  Group  4  has  expected  dead  =3.3 
and  group  6  has  expected  live  per  tank  =  0.  Thus  groups  1,  2,  3, 
4,  6  have  small  expected  frequencies  in  at  least  some  of  the  cells 
of  the  table. 


We  use  a  (relatively  stringent)  criterion  of  applicability  of 
asymptotic  chi  square  theory  that  requires  that  there  be  an  ex¬ 
pected  frequency  of  at  least  5  within  each  cell  of  the  table. 

Only  group  5^satisfies  this  criterion  within  the  Holcombe  and 
Phipps  data.  We  must  thus  base  some  of  the  within  groups  heter¬ 
ogeneity  tests  on  exact,  small  sample  theory. 

Thus  we  wish  to  pool  across  groups  the  results  of  tests  of 
homogeneity  of  responses  among  tanks  within  groups.  Some  of  these 
tests  are  based  on  asymptotic  theory  while  others  are  based  on 
exact,  small  sample  theory. 

We  have  developed  a  computer  program,  EXAX2 ,  to  carry  out 
such  a  procedure.  We  discuss  this  program  in  detail  and  illus¬ 
trate  its  application  in  the  following  section. 


Dixon  and  Massey  [  13  ]  page  233,  has  a  slightly  more  liberal  cri¬ 
terion,  namely  "...none  of  the  F^'s  (i.e.  expected  frequencies)  is  less 
than  1  and  not  more  than  20%  of  the  F,'s  are  less  than  5..."  Again,  only 
group  5  would  satisfy  this. 
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Probit  fit  to  individual  responses  DeFoe  compound  C  (excluding  control  group) 
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Figure  VII.  2  Probit  fit  to  responses  pooled  across  tanks  within  groups 
DeFoe  compound  C  (excluding  control  group) 
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Figure  VII. 3  Chi  square  test  of  homogeneity  based  on  individual  responses 

DeFoe  compound  C  fry  mortality. 
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Figure  VII. 4  Chi  square  test  of  homogeneity  based  on  responses  pooled  across  tanks  within  groups 

DeFoe  compound  C  fry  mortality. 


Figure  VII. 6  Chi  square  test  of  homogeneity  of  percent  fry  mortality  in  group  2.  Data  from  Holcombe 

and  Phipps  compound  D 
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figure  VII. 7  Chi  square  test  of  homogeneity  of  percent  fry  mortality  in  group  3.  Data  from  Holcombe 

and  Phipps  compound  D 
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Figure  VII. 8  Chi  square  test  of  homogeneity  of  percent  fry  mortality  in  group  4.  Data  from  Holcombe 

and  Phipps  compound  D 
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Figure  VII. 9  Chi  square  test  of  homogeneity  of  percent  fry  mortality  in  group  5.  Data  from  Holcombe 

and  Phipps  compound  D 


igure  VII. 10  Chi  square  test  of  homogeneity  of  percent  fry  mortality  in  group  6.  Data  from  Holcombe 

and  Phipps  compound  D 


VIII.  EXAXZ  —  A  COMPUTER  PROGRAM  TO  TEST  FOR  HETEROGENEITY  OF  RESPONSES 
AMONG  TANKS  WITHIN  GROUPS 


We  saw  in  the  previous  section  for  DeFoe's  fry  mortality  responses 
from  the  test  on  compound  C  that  we  obtained  diametrically  opposite 
impressions  about  the  existence  of  tank  to  tank  heterogeneity  within 
groups  depending  on  whether  we  based  our  homogeneity  test  on  a  probit 
model  fit  or  on  a  congingency  table  fit.  This  was  attributed  to 

1.  Differences  in  weights  used  in  the  denominators  of  the  chi 
square  statistics  (based  on  the  assumed  model) 

2.  Small  expected  frequencies  that  produce  substantial  departures 
from  the  asymptotic  distribution  theory. 

To  take  account  of  problem  (1) ,  we  use  the  strategy  of  carrying  out  sep¬ 
arate  chi  square  heterogeneity  tests  within  each  concentration  group, 
without  imposing  any  structure  on  the  concentration  response  relation. 


There  is,  however,  a  technical  problem  associated  with  this  scheme. 

For  many  (if  not  most)  of  the  responses  of  interest  the  response  proba¬ 
bilities  of  occurrence  are  fairly  close  to  0  or  1.  Therefore  the  ex¬ 
pected  frequencies  of  occurrence  can  be  rather  small,  thus  invalidating 
the  asymptotic  chi  square  theory,  upon  which  most  of  the  standard  tests 
are  based.  We  saw  this  in  connection  with  the  fry  mortality  data  from 
the  Holcombe  and  Phipps  test  on  compound  D. 

We  have  developed  a  computer  program,  EXAX2,  that  overcomes  this 
problem.  It  carries  out  separate  chi  square  tests  within  each  treat¬ 
ment  group,  based  on  asymptotic  theory  when  the  expected  frequencies 
within  cells  are  large  enough  and  based  on  exact,  small  sample  theory 
when  the  expected  frequencies  within  cells  are  small.  Thus  heterogene¬ 
ity  tests  using  EXAX2  are  applicable  even  with  the  relatively  small  sam¬ 
ple  sizes  and  relatively  extreme  response  rates  encountered  in  fish  tox¬ 
icity  tests.  The  theory  underlying  the  program  and  instructions  for  its 
use  are  described  in  [14]  which  included  as  Appendix  AVIII.2.  In  the 
body  of  the  section  we  describe  the  basis  of  the  calculations  in  EXAX2 
and  illustrate  its  application  with  examples. 

EXAX2  pools  the  results  of  tests  for  heterogeneity  in  each  of 
I(I>1)  2  x  K  independent  contingency  tables  (representing  I  groups,  K 
tanks  per  group).  The  homogeneity  test  within  each  group  is  based  on 
the  usual  chi  square  statistic,  using  either  its  asymtotic  distribution 
or  its  exact  small  sample  distribution,  as  appropriate.  The  following 
approach  is  used. 


1,  Within  each  concentration  group,  the  chi  square  for  homogeneity 
among  the  K  tanks  is  calculated.  Let  X..,  N. .  represent  the 


number  of  dead  fish  and  the  total  number  of  fish  respectively  in 
the  j-th  tank  of  the  i-th  group.  Let  p^  be  a  pooled  estimate  of 
response  probability  in  the  i-th  group  and  q^  =  1  -  p^.  Then  the 
chi  square  for  group  i  is 


2 

*i 


K 

I 

j=l 


(Xii  -  Ni-jPi >‘ 

/N  /X 


If  p^  =  0  or  q^  =  0  (corresponding  to  zero  percent  or  100  per¬ 
cent  observed  mortality)  the  table  is  degenerate  and  x^  =  0  by 
definition. 


2.  If  all  the  expected  frequencies  (N^jP-p  N^q^)  are  greater  then 

a  specified  cutoff  value  (we  currently  use  five) ,  asymptotic  the¬ 
ory  is  used.  Thus  the  observed  significance  level  of  x?  is 
based  on  the  chi  square  distribution  with  K  -  1  d.f. 

3.  If  one  or  more  of  the  expected  frequencies  is  less  than  the  cut¬ 
off  level,  then  the  exact  distribution  of  x?,  conditional  on  the 
observed  marginal  totals,  is  used.  The  observed  significance 
level  is  based  on  this  exact  distribution.  This  approach  is  de¬ 
scribed  in  Agresti  and  Wackerly  [15].  The  exact  distribution 
of  x±  is  computed  by  systematically  enumerating  all  possible  ta¬ 
bles  having  the  given  margins  using  the  algorithm  in  Boulton  and 
Wallace  [  16  ]  and  the  associated  probabilities  due  to  March  [  18  ] 

4.  Let  A-£  denote  the  observed  significance  level  in  the  i-th  group. 
We  pool  the  A^'s  over  groups  to  obtain  an  overall  test  by  an 
approach  analogous  to  Fisher's  method  as  described  in  Littell 
and  Folks  [  19  ,  20  ].  For  each  group  we  calculate,  based  on  ex¬ 
act  or  asymptotic  theory,  -2lxiA^  and  its  mean  and  variance  under 
the  null  hypothesis  of  homogeneity. 

5.  The  observed  significance  levels  are  pooled  into  a  single  sta¬ 
tistic  by  calculating 


-  I 

1/2 

r  JE. 

1 

1/2 

y  c-2£a.) 

- 

N  E(-2£nA±) 

Li= 1 _ J 

-  i=l 

J 

'  i  I 

y  Var (-2£nA^) /4 

E(-2£nA±) 

1/2 

L  i=l 

i=l 

Z  is  referred  to  a  standard  normal  distribution.  Tne  null  hypo¬ 
thesis  of  tank  to  tank  homogeneity  is  rejected  for  large  values 
of  Z.  (The  square  root  transformation  is  used  because  it  rep¬ 
resents  the  variance  stabilizing  transformation,  under  asymptotic 
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theory, for  Z£(-2£itA^)  and  thus  probably  improves  the  normality 
approximation. ) 

In  addition  to  calculating  preliminary  tests  of  tank  to  tank 
heterogeneity  within  treatment  groups,  EXAX2  can  carry  out  several 
other  statistical  procedures  useful  in  the  analysis  of  data  from 
aquatic  toxicity  tests.  In  particular  it  can: 

•  Pool  data  across  tanks  within  groups  and  test  for  heter¬ 
ogeneity  of  response  rate  across  groups  by  use  of  the 
chi  square  test  and  either  exact  small  sample  theory  or 
asymptotic  large  sarnie  theory. 

•  Calculate  confidence  intervals  on  the  odds  ratios  of 
treatment  groups  to  control  group  using  the  exact  non¬ 
central  distribution  of  Fisher's  exact  test  statistic. 

These  applications  will  be  discussed  in  detail  in  later  section. 

We  now  consider  several  illustrations  of  the  use  of  EXAX2  for 
tests  of  heterogeneity  among  tanks  within  groups.  The  EXAX2  out¬ 
puts  are  shown  in  the  referenced  figures.  The  observed  and  ex¬ 
pected  cell  frequencies  are  indicated.  If  any  of  the  expected 
cell  frequencies  are  lower  than  the  (use? — specified)  cutoff  of 
5,  exact  distribution  theory  is  used.  The  exact  distribution 
of  chi  square,  conditional  on  the  marginal  totals,  is  enumerated 
and  displayed.  The  observed  value  of  chi  square,  the  observed 
significance  level,  -2£nA^,  E(-2£n  A^) ,  Var(-2£n  A^)  are  calcu¬ 
lated.  The  six  independent  tests  are  combined  by  summing 
-2-tnAi,  E(-2£nAi),  Var(-2£nAi)  over  groups  and  calculating  Z. 

DeFoe  compound  C 

a)  Embryo  mortality 

b)  Fry  mortality 

Holcombe  and  Phipps,  Compound  D 

a)  Embryo  mortality 

b)  Fry  mortality 

Jarvinen,  compound  B 

a)  Embryo  mortality 


b)  Fry  mortality 


DeFoe  Compound  C 


Embryo  Mortality 

There  are  two  tanks  per  treatment  group,  50  embryo  per  tank.  The 
results  from  the  EXAX2  calculations  are  shown  in  Figures  VIII. 1  to 
VIII. 6  and  are  summarized  below*  The  pooled  significance  level  cal- 
culatations  (using  Fisher's  method)  are  presented  below  the  results 
for  group  6.  The  probability  of  a  standard  normal  deviate  exceeding 
0.865  is  0.19. 


Embryo  Mortality 


(Chi  sq) 

(A±) 

(-2£nAi) 

E(-2£nAi)  Var(-2£nAi) 

Trt 

Method 

XSQ0BS 

AI 

YY 

EY  VARY 

1 

asympt 

0.37205 

0.54189 

1.22540 

2.00  4.00 

2 

asympt 

6.76271 

0.00931 

9.35371 

2.00  4.00 

3 

asympt 

0.38610 

0.53436 

1.25338 

2.00  4.00 

4 

asympt 

1.07250 

0.30038 

2.40541 

2.00  4.00 

5 

asympt 

1.05086 

0.30531 

2.37286 

2.00  4.00 

6 

asympt 

0.0 

1.00 

0.0 

2.00  4.00 

SY  =  16.611 

YMU  =  12.00 

SVARY  =  24.00 

Z  =  0.86483 

Except  for  group 

2,  where  the  response  from  tank  1 

appears  to  be  an  out- 

lier 

■,  there  is  no  statistical  evidence  of 

tank  to 

tank  heterogeneity 

within  groups. 

Fry  Mortality 

There  are  two  tanks  per 

treatment  group,  20  fry  per  tank.  The  result 

from  the  EXAX2  calculations 

are  shown  in 

Figures  VIII. 7  to  VIII. 12  and 

are 

summarized  below.  The 

pooled  significance  level  calculations  are 

presented  below 

the  results 

for  group  6. 

The  probability  of  a  standard 

normal  deviate  exceeding  0. 

175  is  0.43. 

Fry  Mortality 

(Chi  sq) 

(A±) 

(-2£nA, ) 

E(-2£nAt)  Var(-2£nAi) 

Trt 

Method 

AI 

YY 

EY  VARY 

1 

(Row  total  = 

0) 

0.0 

0.0  0.0 

2 

(Row  total  = 

0) 

0.0 

0.0  0.0 

3 

EXACT 

2.10526 

0.48718 

1.43825 

0.70068  0.5168 

4 

EXACT 

1.07625 

0.48780 

1.43568 

0.70033  0.51499 

5 

EXACT 

0.14337 

1.00 

0.0 

1.12060  2.7544 

6 

(Row  total  = 

0) 

0.0 

0.0  0.0 

SY  =  2.8739 

YMU  =  2.5216 

SVARY  =  3.7862 

Z  =  0.17516 

^Figures  VIII. 1  to  VIII. 36  are  contained  in  Appendix  AVIII.l. 
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Fry  Mortality 

(Chi  Sq)  (A±)  (~2lnA±)  E(-2£nAi)  Var(-2£nAi) 

Trt  Method  XSQ0BS  AI  YY  EY  VARY 


1 

EXACT 

0.70922 

1.00 

0.0 

1.3544 

2.7875 

2 

EXACT 

1.08696 

0.95647 

0.08901 

1.543 

3.4415 

3 

EXACT 

7.06870 

0.07579 

5.15948 

1.543 

3.4415 

4 

EXACT 

5.21662 

0.18667 

3.35688 

1.7349 

2.6708 

5 

asympt 

6.44967 

0.01967 

4.77915 

2.0 

4.0 

6 

(Row  total  = 

0) 

1.0000 

0.0 

0.0 

0.0 

SY  =  13.385 

YMU  = 

8.1752 

SVARY 

=  17.341 

Z  =  1.09755 

Groups  1,  2,  4,  6  show  no  statistical  evidence  of  tank  to  tank  het¬ 
erogeneity.  Groups  3,  6  show  some  marginal  suggestion  of  tank  to 
tank  heterogeneity.  It  is  interesting  to  note  that  in  direct  anal¬ 
ogy  with  the  results  for  embryo  mortality,  tank  3  of  group  3  has 
about  twice  the  mortality  of  the  other  tanks  in  the  group.  This 
"coincidence"  should  be  further  investigated  to  determine  if  this 
increased  mortality  has  a  systematic  cause.  Overall,  Z  =  1.10.  The 
probability  of  a  standard  normal  random  variable  exceeding  this  val¬ 
ue  by  chance  is  0.136.  Thus  there  is  at  most  a  marginal  suggestion 
of  some  possible  tank  to  tank  variation,  but  nothing  conclusive. 

Jarvinen  Compound  B 
Embryo  Mortality 

There  are  two  tanks  per  treatment  group,  approximately  50  embryos 
per  tank  (actually,  between  48  and  57  with  an  average  of  51.2).  The 
results  from  the  EXAX2  calculations  are  shown  in  Figures  VIII. 25  to 
VIII. 30  and  are  summarized  below.  The  pooled  significance  level  cal¬ 
culations  are  given  below  the  results  for  group  6.  The  probability 
of  a  standard  normal  deviate  exceeding  2.54  is  0.005. 


Embryo  Mortality 


Trt 

Method 

(Chi  Sq) 
XSQOBS 

(A±) 

AI 

(-2£nAi) 

YY 

E(-2£nAi) 

EY 

Var(-2£nA±) 

VARY 

1 

Asymp  t 

6.51208 

0.01071 

9.07234 

2.0 

4.0 

2 

EXACT 

1.78430 

0.27477 

2.58361 

1.52817 

3.23307 

3 

EXACT 

3.05250 

0.15951 

3.67136 

1.17120 

2.77808 

4 

EXACT 

0.00085 

1.0000 

0.00000 

1.04511 

1.38125 

5 

EXACT 

0.74812 

0.43704 

1.65548 

1.45242 

2.94750 

6 

EXACT 

4.75938 

0.05966 

5.63828 

1.52125 

3.17408 

SY  =  22.62107 
Z  =  2.54488 


YMU  =  8.71816 


SVARY  =  17.51398 


Groups  1  and  6  show  significant  differences  between  mortality  rates 
in  replicate  tanks.  Overall  (Z  =  2.54)  the  heterogeneity  statistic 
is  significant  at  a  =  0.005  level.  Thus  overall  there  is  strong  sta¬ 
tistical  evidence  of  tank  to  tank  heterogeneity. 

Group  1  shows  considerable  tank  to  tank  heterogeneity  in  response, 
group  6  shows  moderate  tank  to  tank  heterogeneity  in  response,  and 
group  3  shows  marginal  tank  to  tank  heterogeneity  in  response. 

If  we  plot  mortality  rate  by  group  number  we  obtain 

Embryo  mor 
.30  - 
.25  - 
.20  - 
.15  - 
.10  - 
.05  - 

°  1  2  3  4  5  6 


tality  rate 


X 

X 

X 

X 

X 

X 

X 

X 

x 

x 


In  agreement  with  the  DeFoe  and  Holcombe  and  Phipps  embryo  mortality 
results,  we  see  no  trend  in  embryo  mortality  rate  with  increasing 
toxicant  concentration.  We  see  tank  to  tank  heterogeneity  in  group 
1  and  to  a  lesser  extent  in  groups  3,  6.  There  is  the  suggestion 
that  the  response  from  tank  1  of  group  1  might  be  an  outlier.  This 
will  be  considered  further  in  section  X. 

Fry  Mortality 

There  are  two  tanks  per  treatment  group,  approximately  15  fry 
per  tank  (between  14  and  16  with  an  average  of  14.9).  The  results 
from  the  EXAX2  calculations  are  shown  in  Figures  VIII. 31  to  VIII. 36 
and  are  summarized  below.  The  pooled  significance  level  calcula- 


tions  are 

given  below  the  results 

for  group 

6.  The  probability  of 

a  standard 

normal  deviate  exceeding  -0.84  is 

0.80. 

Fry  Mortality 

(chi  sq) 

(A±) 

(-2&iA±) 

E(-2£nAi) 

Var (2£nA£ 

Trt 

Method 

XSQOBS 

AI 

YY 

EY 

VARY 

1 

(Row  total 

=  0) 

0.0 

0.0 

0.0 

2 

(Row  total 

=  0) 

0.0 

0.0 

0.0 

3 

(Row  total 

=  0) 

0.0 

0.0 

0.0 

4 

(Row  total 

=  0) 

0.0 

0.0 

0.0 

5 

asymp  t 

0.13393 

0.71439 

0.67264 

2.0 

4.0 

6 

(Row  total 

=  0) 

0.0 

0.0 

0.0 

SY  =  0.67264  YMU  =  2.0  SVARY  =  4.0 

Z  =  -0.84013 


This  test  does  not  reveal  the  concentration-response  curve  very  well. 

In  groups  1-4  no  fry  died  while  in  group  6  all  the  fry  died.  Thus  the 
tables  are  degenerate  in  5  of  the  6  treatment  groups.  In  group  5  there 
is  no  suggestion  of  tank  to  tank  heterogeneity.  Thus  overall,  there 
is  no  suggestion  of  tank  to  tank  heterogeneity. 

In  summary  we  have  seen  several  different  degrees  of  tank  to 
tank  heterogeneity  within  groups  in  the  three  toxicity  tests  studied. 
With  respect  to  embryo  mortality  the  DeFoe  test  shows  no  suggestion 
of  heterogeneity  with  the  exception  of  an  isolated  outlier,  the 
Holcombe  and  Phipps  test  reveals  possible  suggestion  of  heterogeneity, 
and  the  Jarvinen  test  reveals  strong  suggestion  of  heterogeneity  but 
this  may  also  be  due  to  an  outlier.  With  respect  to  fry  mortality 
the  DeFoe  and  Jarvinen  tests  show  no  suggestion  of  tank  to  tank 
heterogeneity  within  groups.  The  Holcombe  and  Phipps  tests  shows 
possible  suggestion  of  tank  to  tank  heterogeneity  but  the  abberrant 
looking  response  originates  in  precisely  the  same  tank  as  does  the 
aberrant  looking  embryo  mortality  response.  This  raises  questions 
about  both  responses.  In  brief,  there  does  not  appear  to  be  very 
much  tank  to  tank  heterogeneity  within  groups  and  that  which  does 
occur  may  be  due  to  isolated  outlying  results. 


IX.  ADJUSTMENTS  TO  ACCOUNT  FOR  TANK  TO  TANK  HETEROGENEITY  WITHIN  TREAT - 
MENT  GROUPS 


Background,  Derivations,  and  Discussions 


We  have  considered  the  problem  of  testing  for  tank  to  tank 
heterogeneity  within  treatment  groups.  The  results  of  such  tests, 
will  influence  the  way  we  treat  the  data  in  subsequent  analyses. 
Many  methods  for  analyzing  qualitative  dose  response  data  tacitly 
assume  that  there  is  no  tank  to  tank  variation  in  response  rates 
within  groups.  Binomial  distribution  theory  is  used  on  data 
pooled  across  tanks  within  groups.  Sometimes  the  assumption  of 
lack  of  tank  to  tank  heterogeneity  is  reasonable,  as  we  have 
seen  with  the  DeFoe  and  Jarvinen  fry  mortality  data.  Sometimes 
there  is  borderline  statistical  evidence  of  tank  to  tank  hetero¬ 
geneity,  as  was  the  case  with  the  Holcombe  and  Phipps  test 
(both  for  the  embryo  mortality  and  fry  mortality  data).  In  other 
situations,  such  as  in  the  embryo  mortality  data  from  Jarvinen' s 
test  on  methyl  parathion  encapsulated,  there  is  stronger  statis¬ 
tical  evidence  of  tank  to  tank  heterogeneity,  in  at  least  some 
of  the  treatment  groups . 


In  this  section  we  consider  methods  for  accounting  for  tank 
to  tank  heterogeneity  when  it  exists.  Three  main  approaches  are 
possible. 


1.  We  can  formulate  models  that  explicitly  account  for  tank  to 
tank  heterogeneity  within  groups  and  fit  these  models  to  the 
data  by  specialized  techniques  such  as  maximum  likelihood 
estimation,  using  special  purpose  computer  programs.  Two 
such  models  are  the  beta  binomial  [21]  and  the  correlated 
binomial  [22].  This  approach  requires  the  formulation  of 
specialized  models  and  development  of  specialized  programs 
to  implement  these  analyses.  Thus  such  analyses  will  be 
difficult  for  experimenters  to  carry  out  and  the  results  of 
such  analyses  will  be  more  difficult  to  interpret. 


2.  We  can  carry  out  analyses  on  a  per  tank  basis  rather  than 
on  a  per  fish  basis.  That  is,  summary  values  of  such  as 
percent  mortality,  average  weight  gain,  etc  are  calculated 
within  each  tank  and  are  then  used  as  basic  values  for  sub¬ 
sequent  analyses.  This  is  currently  the  most  commonly  used 
approach  for  analyzing  fish  toxicity  data.  While  it  does 
correctly  account  for  possible  tank  to  tank  heterogeneity, 
it  does  so  at  the  cost  of  considerable  reduction  in  sensiti¬ 
vity.  Namely,  the  data  from  perhaps  50  to  100  fish  or  embryos 
per  group  are  summarized  by  just  two  to  four  summary  values. 
This  leaves  very  few  degrees  of  freedom  for  estimating  error 
and  so  diminishes  the  sensitivity  of  the  subsequent  procedures. 
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3.  We  can  adjust  the  data  to  reflect  the  increased  variability 
due  to  tank  to  tank  heterogeneity  and  then  use  "standard", 
binomial  based  techniques  on  the  adjusted  "data".  This  third 
approach  is  a  workman-like  approach  and  has  the  dual  virtues 
of  being  simple  to  carry  out  and  of  permitting  the  use  of 
"standard"  statistical  procedures  and  computer  programs  for 
subsequent  analyses. 

Heterogeneity  among  tanks  within  groups  can  be  alternatively  re¬ 
garded  as  correlation  among  the  responses  of  the  various  fish  within  the 
same  tank.  Such  correlation  is  usually  positive  and  this  has  the  effect 
of  increasing  the  variability  of  statistics  over  and  above  that  which 
would  be  assumed  under  a  binomial  model. 

The  increased  variability  can  be  accounted  for  by  reducing  the 
actual  sample  size  in  each  tank  to  an  effective  sample  size  and  then 
disregarding  the  correlation.  The  number  of  responses  is  reduced  propor¬ 
tionately  so  that  the  observed  response  rate  within  each  tank  remains 
constant.  Suppose  for  instance  there  are  40  embryo  in  a  tank  and  8  die. 
We  thus  have  an  observed  response  rate  of  0.20.  Suppose  that  the  re¬ 
sponses  within  each  tank  are  positively  correlated  and  the  variance  of 
p  is  inflated  20  percent  by  this  correlation.  That  is 


Var(p) 


1.2p(l  -  p) 
40 


Then  we  can  regard  the  effective  sample  size  within  that  tank  as  40/1.2 
=  33.33.  To  maintain  the  response  rate  at  the  observed  level  of  .20  we 
adjust  the  number  of  responses  down  to  a  corresponding  effective  number 
8/1.2  =  6.67.  We  then  analyze  the  data  from  this  tank  by  ignoring  the 
correlation  and  treating  the  data  as  if  we  have  6.67  responses  in  33.33 
trials.  All  the  standard  analysis  procedures,  predicated  on  the  assum¬ 
ption  of  no  tank  to  tank  heterogeneity  within  groups,  can  be  applied  to 
the  modified  "data". 


The  per  tank  analyses  mentioned  in  paragraph  2  can  be  regarded 
as  a  limiting  case  of  data  adjustment  where  we  adjust  the  effective 
sample  sizes  within  tanks  all  the  way  down  to  1. 

We  now  consider  the  calculation  of  adjustment  factors.  Motiva¬ 
tion  for  the  adjustment  procedure  comes  from  the  form  of  the  beta  bi¬ 
nomial  model  [  21 ] .  Namely  suppose  X^j  is  the  number  of  responses  within 
tank  j  of  treatment  group  i.  The  betaJ binomial  model  extends  the  binom¬ 
ial  model  to  allow  for  tank  to  tank  variation  within  groups.  Thus  we 
assume 


Xij  ^  Binomial  (N^j  ,  Pi  j )  conditional  on  p. 


j  =  1, 
i  =  1, 


.  . ,  J 
..,  I 


where  p^  n,  Beta(ot^, 


and 


Njj  are  fixed  • 


Let 


Ui  ‘  a.±  +  3± 


Then 


E(V  - 


a,  +  0, 


Var(p..)  =  y.(l  -  y.)— 


When  0^  =  0,  Var(p^j)  =  0  and  we  are  back  to  the  case  of  no  tank  to  tank 
heterogeneity,  at  least  within  the  i-th  group.  The  larger  0^  is,  the 
greater  is  the  extent  of  tank  to  tank  heterogeneity.  0^  varies  between 
0  and  00 . 


Now  consider  the  distribution  of  j . 

L(xij,Pij)  ~  Binomial  (N^ ,  p^) 

Hp±j)  ~  Beta  (ai}  $i) . 

These  two  facts  imply  that  X^j  has  the  marginal  beta  binomial  distribu¬ 
tion  with  probability  function 


It  can  be  shown  directly  that 


E(V  -  Vi 

Var(X±j)  -  Nijyi(l  -  y.) 


1  +  N.  .0. 

j-J-A 


1  +  0. 


O<0^<°° 


We  see  that  the  variance  of  X^j  is  inflated  over  and  above  a  binomial 
variance  by  a  multiplicative  factor. 

Suppose  ^  j  =  l,...,  J.  This  assumption  is  reasonable  in 

fish  toxicity  tests  where  Nj.  represents  the  number  of  fish  or  embryos 
within  the  j-th  tank  of  the  l-th  group.  In  this  case  the  multiplicative 
factor  becomes  [(1  +  N^0^)/(1  +  0^)]  =  K^,  j  =  1,  ...,  J.  Thus 

Var(X^)  =  N+C^y^l  -  y^)  where  l£K^°°.  Define  p^  =  X^/N^.  p„  is 
the  observed  response  proportion.  J  J  '  J 

Therefore 


Var<P,P  =  ir  M1  -  Mi) 


j  1 . .  ,  J 


I’i 


Thus  the  effective  sample  size  is  Ni/Ki.  As  the  extent  of  tank  to  tank 
heterogeneity  approaches  0(i.e.  0^->-  0),  approaches  1  and  N^/K^  ap¬ 
proaches  N1.  As  the  extent  of  tank  to  tank  heterogeneity  gets  greater 
and  greater,  and  so  Ni/Ki  approaches  1.  Thus  the  two  extreme  sit¬ 

uations  are  not  adjusting  the  within  tank  sample  sizes  at  all  and  adjust¬ 
ing  the  within  tank  sample  size  down  to  1.  The  latter  adjustment  resem¬ 
bles  performing  analyses  on  a  per  tank  basis  rather  than  on  a  per  fish 
basis.  Thus  method  2  for  accounting  for  tank  to  tank  heterogeneity  can 
be  regarded  as  an  extreme  case  of  method  3.  Note  that  if  N^  =  N  and 

0.  =  0  for  all  i,  then  =  K  for  all  i. 

The  procedure  suggested  here  for  calculating  adjustment  factors 
is  motivated  by  the  results  based  on  beta  binomial  theory,  but  is  simpler 
to  carry  out. 

Let  Xj. ,  denote  the  number  of  responses  and  the  total  number  of  fish 

respectively  within  tank  j  of  group  i,  j  =  1,...,  J.  Let  p^  =  X^/Nj* 

Let  p^  denote  the  average  response  rate  within  the  i-th  group.  The  actual 
variance  of  {L  .  is  the  binomial  theory  variance  multiplied  by  the  infla¬ 
tion  factor  K^.  Thus 

Var(p±j) 

Ki  ’  [Pl(l  -  pp/Nj 

This  suggests  that  we  can  estimate  by  estimating  Var(f^j)  and  p^  by 
their  sample  analogyes.  Let 

N,  =  J~1r?  N  ,  p .  =  J''1r?  . ,  Var(p.  . )  =  (J  -  l)_1r?  .  (p.  .  -  p.)2 

i  3=1  ij  ’  J  =  1  ij  ij  3  =  1  13  i 

denote  the  average  sample  size,  the  average  observed  response  rate,  and 
the  sample  variance  of  response  rates  within  the  i-th  group  respectively. 
Note  that  the  Nij's  are  generally  nearly  equal  in  fish  toxicity  data.  We 
estimate  K.  as 

l 

K.  =  Var(p  )/[p±(l  -  P.)/N.] 

The  numerator  of  this  ratio  is  the  observed  variance  among  the  p^j's 
while  the  denominator  is  the  variance  that  would  be  expected  just  due 
to  binomial  variation.  We  adjust  each  X  . ,  N. .  in  the  group  downward  by 

a  ij  l  j 

a  factor  K^. 


Notes : 


K-^  is  necessarily  greater  than  1  but  K.^  may  not  be.  If  K^<1  or 
if  there  is  no  statistical  evidence  of  tank  to  tank  heterogeneity 
then  we  should  not  adjust  sample  sizes. 

Assuming  binomial  theory  when  there  is  in  fact  tank  to  tank  het¬ 
erogeneity  results  in  underestimation  of  the  variabilities  of 


the  various  statistics  calculated.  Thus  hypothesis  tests  comparing 
treatment  group  and  control  group  response  rates  will  reject  more 
often  than  they  should,  thereby  resulting  in  underestimation  of  no¬ 
effect  levels.  However  the  opposite  effect  occurs  with  respect  to 
inferences  about  safe  concentrations  based  on  dose  response  curves. 
Underestimation  of  variability  results  in  overly  large  lower  confi¬ 
dence  bounds  on  safe  concentration.  A  nominal  95%  lower  confidence 
bound  may  in  fact  be  just  an  80%  lower  confidence  bound. 

3.  The  decision  as  to  when  to  adjust  sample  sizes  downward  should  be 
reasonably  liberal,  perhaps  when  there  is  statistical  evidence  of 
tank  to  tank  heterogeneity  at  the  a  =  .20  or  the  a  =  .25  level. 
However  K^  must  always  be  greater  than  or  equal  to  1. 

/\ 

4.  The  calculation  of  Ki  by  means  of  ratios  o£  variances  is  ineffi¬ 
cient.  A  more  precise  way  of  determining  would  be  to  estimate 
0£  from  the  data  by  maximum  likelihood  estimation  and  substitute 
this  estimate,  0^,  into  the  expression  for  Ki.  Such  an  estimate 
would  always  be  greater  than  or  equal  to  1.  However  such  an  app-^ 
roach  would  require  special  purpose  programs.  The  estimation  of 
as  discussed  in  this  section  is  simpler  and  can  be  carried  out  by 
hand  calculation.  However  in  the  future  we  will  look  in  calculation 

A 

of  Ki's  by  means  of  maximum  likelihood  estimation  based  on  the  beta 
binomial  model. 

5.  We  can  calculate  separate  Ki's  for  each  treatment  group  based  on 
the  responses  solely  from  the  tanks  in  that  group.  Alternatively 
if  =  N  for  all  i  and  if  0j_  =  0  for  all  i  then  Ki  =  K  for  all 
i.  We  can  then  calculate  a  common  inflation  factor  K  for  all 
treatment  groups.  The  question  of  whether  we  should  fit  separate 
adjustment  factors  within  each  group  or  a  single  common  factor  is 
a  research  problem  in  its  own  right.  We  defer  the  answer  to  that 
question  to  future  work  and  in  this  report  confine  attention  to 
fitting  common  adjustment  factors  for  all  groups. 

If  it  is  sensible  on  biological  grounds,  results  on  tank  to 
tank  heterogeneity  observed  in  previous  similar  tests  might  be 
combined  with  current  results  to  obtain  a  more  accurate  adjust¬ 
ment  factor. 

6.  The  adjustment  procedure  might  take  into  account  the  statistical 
precisions  of  the  estimates  K^,  K.  A  conservative  way  to  do  this 
would  be  to  use  upper  confidence  bounds  on  K^,  K  as  adjustment 
factors  rather  than  the  point  estimates.  This  modification  will 
also  await  future  work. 

B.  Illustrations 


We  illustrate  the  application  of  this  adjustment  procedure  to  several 
sets  of  data.  First  consider  the  fry  mortality  data  of  Holcombe  and  Phipps 
for  compound  D.  From  the  preliminary  test  of  tank  to  tank  heterogeneity 
within  treatment  groups  we  conclude  that  there  is  at  most  marginal 
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statistical  evidence  of  tank  to  tank  heterogeneity  within  treatment 
groups.  (The  observed  significance  level  is  0.14).  In  this  example 
J  =  4,  =  25  for  all  i,  j. 

Group  1:  pn  =  0.08,  p12  =  0.08,  p^  =  0.04,  p  =  0.04,  p  =  0.0 

N1  =  25,  Var (p^j )  =  0.00053,  ^(1  -  p^/Nj-  0.00226 


Var(plj) 
[Pid  -  ] 


0.00053 

0.00226 


=  0.236 


Group  2:  p21  =  0.12,  p22  =  0.04,  p23  =  0.08,  p24  =  0.08,  p2  =  0.0 

A  A  A  A  A 

N2  =  25,  Var(p2j)  =  0.00107,  p2(l  -  p2>/N2  =  0.00294 


Var(p2i) 

/\ 

tp2 (i  -  p2)/n21 


0.00107 

0.00294 


=  0.362 


Group  3:  p31  =  0.08,  p32  =  0.00,  p33  =  0.20,  p34  =  0.04,  p3  =  0.0 

N3  =  25,  Var(p3j)  =  0.00747,  p3(l  -  P3>/N3  =  0.00294 


Var(P 31 > 


tp3 C 1  -  P3)/N3] 


0.00747 

0.00294 


=  2.536 


Group  4:  p41  =  0.16,  p42  =  0.20,  p43  =  0.16,  p44  =  0.00,  p4  =  0.1 

A  A  A  A 

N4  =  25,  Var(p4.)  =  0.00787,  p4(l  -  P4)/N4=  0.00452 


Var(P4j> 

[P4(i  -  p4)/n4) 


0.00787 

0.00452 


=  1.739 


Group  5:  p51  =  0.92,  p^2  =  0.84,  p53  =  0.64,  p,_4  =  0.76,  p5  =  0.7 

N5  =  25,  Var(p5  )  =  0.01427,  p5 (1  -  P5>/N5  =  0.00664 


Var(p5  ) 


0.01427 

0.00664 


2.150 


Group  6:  Pfil  =  Pfe2  =  pfi3  =  ^  =  1.0,  Pfe  =  1.0,  N&  =  25,  Var(p6j) 

P6(1  '  P6)/S6  =  °-° 


K,  is  indeterminate  and  so  we  take  it  to  be  1.0. 
6 


Thus , 


$  _  i=l  0.236  +  0.362  +  2.536  +  1.739  +  2.150  +  1.000 
K.  — 


K  is  the  average  adjustment  factor  which  is  used  to  adjust 
the  observed  sample  sizes  to  effective  sample  sizes.  The  sample  sizes 
are  adjusted  downward  so  as  to  maintain  the  observed  response  rates  with 
in  each  tank.  The  results  of  the  adjustment  procedure  are  presented  in 
Table  IX. 1.  These  adjusted  values  are  used  as  basic  input  "data"  for 
subsequent  analyses.  We  then  proceed  as  if  there  is  no  tank  to  tank 
variation  within  groups.  The  extrabinomial  variation  has  been  accounted 
for  by  the  adjustment  procedure. 


TABLE  IX. 1  EFFECTIVE  SAMPLE  SIZES  AND  RESPONSES  IN  HOLCOMBE  AND  PHIPPS 
COMPOUND  D  FRY  MORTALITY  DATA  AFTER  ADJUSTMENT 
FOR  TANK  TO  TANK  HETEROGENEITY 


Group 

Tank  A 

Tank  B 

Tank  C 

Tank  D 

1 

Dead 

1.50 

1.50 

0.75 

0.75 

Live 

17.20 

17.20 

17.95 

17.95 

Total 

18.70 

18.70 

18.70 

18.70 

2 

Dead 

2.24 

0.75 

1.50 

1.50 

Live 

16.45 

17.95 

17.20 

17.20 

Total 

18.70 

18.70 

18.70 

18.70 

3 

Dead 

1.50 

0.00 

3.74 

0.75 

Live 

17.20 

18.70 

14.96 

17.96 

Total 

18.70 

18.70 

18.70 

18.70 

4 

Dead 

2.99 

3.74 

2.99 

0.00 

Live 

15.70 

17.96 

15.70 

18.70 

Total 

18.70 

18.70 

18.70 

18.70 

5 

Dead 

17.20 

15.70 

11.97 

14.21 

Live 

1.50 

2.99 

6.73 

4.49 

Total 

18.70 

18.70 

18.70 

18.70 

6 

Dead 

18.70 

18.70 

18.70 

18.70 

Live 

0.00 

0.00 

0.00 

0.00 

Total 

18.70 

18.70 

18.70 

18.70 

We  next  illustrate  the  adjustment  procedure  on  the  embryo 
mortality  data  of  Jarvinen  for  compound  B.  The  preliminary  test 
of  tank  to  tank  heterogeneity  within  treatment  groups  is  highly  signifi¬ 
cant  (Z  =  2,54,  corresponding  to  an  observed  significance  level  of  0.005). 
This  statistical  significance  is  due  to  group  1  (control)  which  shows 
strong  tank  to  tank  differences,  group  6  which  shows  a  moderate  tank  to 
tank  difference,  and  group  3  which  shows  possible  indications  —  but  at 
best  weak  statistical  evidence  —  of  tank  to  tank  heterogeneity. 

In  this  example  J  =  2,  Ntj  are  close  to  Ni  (within  1)  except  for 
group  2.  We  will  assume  here  that  N-jj  =  Ni  when  calculating  K^'s.  This 
assumption  can  be  refined  somewhat,  if  necessary,  to  calculate  separate, 
adjustment  factors  for  each  tank,  but  we  will  not  do  that  here.  This  will 
await  the  development  of  adjustment  procedures  based  on  maximum  likelihood 

estimation. 


Group  1:  pu  =  .235,  pl2  =  .04,  ?1  =  14/101  =  0.138,  Nn  =  51,  N12  = 

50,  N  =  50.5 

Var(p  )  =  0.019  p^l  -  p^/N  =  0.00236 


K, 


Var(p. .) 


[P1(l  ~  Pj^/Nj] 


.019 

.00236 


=  8.056 


Group  2:  p01  =  .105,  p22  =  .038,  p9  =  8/109  =  .073,  N?1  =  57,  N??  =  52, 


21 

N2  =  54.5 


21  ’  22 


Var(p2 J  =  .0022 


p2(i  =  p2)/n2  = 


.00124 


Group  3:  p  =  .04,  =  1.4,  p  =  9/100  =  .09,  =  N3  = 


Var (P^j )  =  -°05 


P3(l  -  P3)/N3  =  .0016 


Var(P3j} 

[p3 (i  -  p3)/n3] 


.0016 


=  3.053 


Group  4:  g  =  .02,  =  .021,  p4  =  2/98  =  .020,  =  50, 

N.  =  49 
4 


VarCp^)  =  3.4  x  10 


P4(l  -  P4)/N4  =  .00041 


Var(p4  ) 


3‘-4-  -x  10 —  =  8  x  10~4  =  0 


[P4C1  -  P4)/N4]  .00041 


Group  5:  0  =  .077,  P52  =  .038,  ?5  =  6/105  =  .057,  =  52,  N52 

53,  S  =  52.5 


Var(£5j)  =  .00077 


P5(l  -  P5)/N5  "  .00102 


K  =  ^  Var(P5j}  =  ,..00077  =  >?50 

5  [p5(l  -  p5;n5]  -00102 


Group  6 :  p^  =  .02,  p^  -  .137,  p^  -  8/101  -  .079,  -  50,  \Tg2 


N,  =  50.5 
b 

Var(p6.)  =  .00687 


P6(l  -  Pg)/N5  =  -00144 


V  P6i  .00687  , 

K.  =  — - ^ -  =  -77777  =  4.77 

6  r~  /-i  “  wm  i  .00144 

[p6(1  "  P6)/N6] 
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Assuming  that  the  relatively  high  embryo  mortality  response  in  group  1 
tank  1  is  not  an  outlier  and  that  the  inflation  in  variability  is  const¬ 
ant  across  groups,  we  calculate  an  average  adjustment  factor  across  treat¬ 
ment  groups. 


Thus , 


6„ 

DC. 

n  _  i=lx  8.056  +  1.799  +  3.053  +  0  +  0.750  +  4.77 


A 

K  is  the  average  adjustment  factor  across  groups.  Alternatively  we 
might  use  the  separate  adjustment  factors  within  groups.  The  results  of 
the  adjustment  procedure  are  presented  in  Table  IX. 2.  These  adjusted 
values  can  be  used  as  basic  input  "data"  for  subsequent  analyses.  We  pro¬ 
ceed  with  further  analyses  as  if  there  is  no  tank  to  tank  heterogeneity 
within  groups. 
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TABLE  IX. 2  EFFECTIVE  SAMPLE  SIZES  AND  RESPONSES  IN 
JARVINEN  COMPOUND  B  EMBRYO  MORTALITY 
DATA  AFTER  ADJUSTMENT  FOR  TANK 
TO  TANK  HETEROGENEITY 


Group 


Tank  A 


ank  B 


OUTLIER  DETECTION  PROCEDURES 


A.  Background 


Another  preliminary  analysis  of  importance  is  the  detection 
of  responses  which  do  not  appear  to  be  in  conformance  with  the 
substantial  majority  of  responses.  Such  exceptional  responses 
are  often  referred  to  as  "outliers".  Outlier  detection  proce¬ 
dures  are  used  to  decide  how  extreme  a  response  must  be  in  order 
to  rule  out  the  possibility  that  its  value  is  reasonably  likely 
to  be  due  just  to  random  variation.  Consider  for  example  the 
percentage  embryo  mortality  responses  from  DeFoe's  test  on 
compound  C  that  are  displayed  in  Figure  VI. 1.  We  remarked  that 
the  mortality  rate  in  group  2,  tank  A  appears  to  be  widely 
separated  from  the  others.  Can  such  a  separation  be  explained 
by  random  variation  or  is  there  some  systematic  factor  peculiar 
to  this  tank?  Similarly,  the  percentage  embryo  mortality  observed 
in  group  1,  tank  A  in  Jarvinen's  test  on  compound  B  is  widely 
separated  from  the  other  responses.  Can  a  separation  of  this 
magnitude  be  reasonably  explained  by  random  variation  or  is  there 
some  systematic  factor  peculiar  to  this  tank? 

Barnett  and  Lewis  [  23  ]  describe  a  wide  class  of  outlier  de¬ 
tection  procedures,  to  screen  out  those  extreme  responses  that 
cannot  be  reasonably  attributed  to  random  variation.  They  inclu¬ 
de  a  procedure  appropriate  for  binomial  responses  (section  3.4, 
pp  122-124) .  Their  procedure  is  based  on  the  assumption  of  n 
independent  responses  X-^,  ...,  each  binomially  distributed 

with  parameters  m  and  p.  They  base  their  outlier  test  on  the 
exact  conditional  distribution  of  maxXj  given  EjXj.  In  our  data 
n  represents  the  number  of  tanks  per  group,  m  is  the  number  of 
embryos  or  fry  per  tank  (assumed  to  be  equal  from  tank  to  tank) , 
and  Xj  is  the  number  of  responses  (e.g.  dead  embryos)  per  tank. 
Their  tabulation.  Table  XIX  (pp  320-322)  includes  only  the  range 
of  values  n>3,  m>10,  X^n^  =  m,  m  -  1,  m  -  2.  This  is  quite  in¬ 
adequate  for  the  ranges  of  parameters  and  responses  that  arise  in 
toxicity  tests.  Thus  their  exact  conditional  test  is  not  too 
useful  for  our  needs. 

Barnett  and  Lewis  state,  on  page  123,  that  an  alternative, 
approximate  approach  to  outlier  detection  in  the  binomial  case 
is  to  transform  {Xj/m}  using  the  arc  sine  transformation  and 
then  apply  normal  theory  based  procedures  to  these  transformed 
values.  This  approach,  and  variants  on  it,  are  in  the  spirit  of 
the  methods  that  we  recommend  in  the  remainder  of  this  section. 

We  consider  both  graphical  and  numerical  procedures. 

The  theoretical  bases  of  our  suggested  methods  are  discussed 
in  Appendix  AX. 


Application  of  Outlier  Detection  Procedures  to  Fish  Toxicity 
Data 


We  apply  the  transformations  discussed  in  Appendix  AX  to 
construct  graphical  outlier  detection  procedures  based  on  normal 
probability  plotting  techniques  and  associated  formal  outlier 
detection  tests.  We  apply  these  procedures  to  the  following 
situations : 

DeFoe:  compound  C 

embryo  mortality  data 
fry  mortality  data 

Holcombe  and  Phipps :  Compound  D 
embryo  mortality  data 
fry  mortality  data 

Jarvinen:  compound  B 


embryo  mortality  data 


DeFoe  compound  C 


I  =  6,  J  =  2 


(i.e.  6  groups,  2  tanks  per  group). 


Embryo  Mortality 


Group 

1 

A 

P  = 

.41; 

all 

expected 

frequencies 

are 

greater 

than 

5. 

Group 

2 

A 

P  = 

.51; 

all 

expected 

frequencies 

are 

greater 

than 

5. 

Group 

3 

A 

P  = 

.37; 

all 

expected 

frequencies 

are 

greater 

than 

5. 

Group 

4 

/V 

p  = 

.37; 

all 

expected 

frequencies 

are 

greater 

than 

5. 

Group 

5 

A 

P  = 

.39; 

all 

expected 

frequencies 

are 

greater 

than 

5. 

Group 

6 

P  = 

.38; 

all 

expected 

frequencies 

are 

greater 

than 

5. 

We  apply  the  transformation  suggested  in  case  1.  Data  summaries  are 
given  below. 


Group  1 


Tank  A 

22 

50 

0.41 

20.5 

3.478 

.50 

.610 

Tank  B 

19 

50 

0.41 

20.5 

3.478 

.50 

-.610 

Group  2 

Tank  A 

32 

50 

0.51 

25.5 

3.535 

.50 

2.600 

Tank  B 

19 

50 

0.51 

25.5 

3.535 

.50 

-2.600 

Group  3 

Tank  A 

20 

50 

0.37 

18.5 

3.414 

.50 

0.621 

Tank  B 

17 

50 

0.37 

18.5 

3.414 

.50 

-0.621 

Group  4 

Tank  A 

16 

50 

0.37 

18.5 

3.414 

.50 

-1.036 

Tank  B 

21 

50 

0.37 

18.5 

3.414 

.50 

1.036 

Group  5 

Tank  A 

22 

50 

0.39 

19.5 

3.449 

.50 

1.025 

Tank  B 

17 

50 

0.39 

19.5 

3.449 

.50 

-1.025 

Group  6 

Tank  A 

19 

50 

0.38 

19.0 

3.432 

.50 

0 

Tank  B 

19 

50 

0.38 

19.0 

3.432 

.50 

0 

To  prepa 
/alues  and  plot 
(i  -  0.5)/12  on 


ormal  probability  plot  we  order  the  standardized 
i  smallest  against  the  plotting  position  100  x 
ability  scale.  These  values  are  indicated  below. 


re  the  n 
the  i-th 
the  prob 


i 

1 

2 

3 

4 

5 

6 

Ordered  Value 

-2.600 

-1.036 

-1.025 

-0.621 

-0.610 

0 

Plotting  Position 

12.5 

20.8 

29.2 

37.5 

45.8 

i 

8 

9 

10 

11 

12 

Ordered  Value 

0 

0.610 

0.621 

1.025 

1.036 

2.600 

Plotting  Position 

54.2 

62.5 

70.8 

79.2 

87.5 

95.8 

The  normal  probability  plot  of  these  points  is  shown  in  Figure 
X. 1.  The  plot  appears  perfectly  symmetrical  about  0  since  J  =  2  and  so 
the  responses  within  groups  have  correlation  -1.0.  The  effective  sample 
size  is  thus  6 (J  -  1)  =  6(2  -  1)  =  6  independent  observations.  The  ref¬ 
erence  line  in  the  plot  is  the  standard  normal  distribution  function.  If 
the  response  rates  are  homogeneous  within  groups  then  the  standardized 
values  should  lie  near  this  line.  If  there  is  extrabinomial  variation 
in  the  data,  that  is  random  tank  to  tank  variation  within  groups,  then 
the  points  should  lie  along  a  line  or  a  curve  with  steeper  slope  than 
the  standard  distribution  function.  If  there  are  outlying  responses  in 
the  data  then  they  should  be  far  removed  from  the  line  or  curve  that 
typifies  the  bulk  of  the  data.  This  latter  situation  is  seen  to  be  the 
case.  The  bulk  of  the  data  lie  very  nicely  along  the  standard  normal 
c.d.f.  line.  The  values  corresponding  to  group  2  are  far  removed  from 
this  line. 

To  determine  the  extent  of  statistical  evidence  that  the  appar¬ 
ent  outliers  did  not  occur  just  due  to  chance  we  calculate  the  probabi¬ 
lity  that  the  maximum  absolute  value  of  six  independent  standard  normal 
random  variables  exceeds  2.600.  More  precisely  let  Z]_,  Z2,  .  ..,  Z0  be 
six  independent  standard  normal  random  variables.  Then 

P[  max  ,  | Z . [  2.600]  =  P[at  least  one  \z,\  >_  2.600] 

=  1  -  P [all  | Z . [  <  2.600]  =  1  -  (P  [|z  (  <  2.600]}6  =  1  -  (.9907)6 
=  0.055 

This  is  of  borderline  statistical  significance.  We  can  thus  infer  that 
based  on  this  test  there  is  marginal  statistical  evidence  that  group  2 
contains  an  outlying  tank. 


The  appearance  of  Figure  VI. 1  suggests  that  the  response  from 
Tank  2A  is  more  than  a  marginal  outlier.  We  can  increase  the  sensitiv¬ 
ity  of  the  above  outlier  test  by  incorporating  additional  information. 

If  we  assume  that  there  is  no  trend  in  response  rates  across  groups  then 
we  can  estimate  the  response  rate  based  on  all  12  tanks  and  can  ignore 
the  correction  factor  (1  -  In  general  this  assumption  will 

not  hold  but  it  seems  reasonable  in  this  example  based  on  the  appearance 
of  Figure  VI. 1  and  on  toxicological  considerations  (i.e.  relatively 
little  penetration  of  chemical  into  the  embryo) .  The  value  of  p  based  on 
12  tanks  is  243/600  =  0.405.  The  largest  standardized  value  is  that 
from  Tank  2A,  namely 


Z  -  32  -  (50) (.405)  3. 385 

[50(.405)(.595)r' 


What  is  the  probability  that  the  most  extreme  of  11  independent  standard 
normal  random  variable  exceeds  3.385  in  absolute  value? 

P [ ..max  .  \Z.\  >  3.385]  =  1  -  {P  [|z  |  <  3.385}11  =  1  -  (.9997)11 
]  ... j  i 

=  0.003 

We  can  thus  infer  that,  with  the  additional  assumption  of  no  trend  in 
response  with  increasing  treatment  level,  there  is  strong  statistical 
evidence  that  the  response  rate  in  Tank  2A  is  an  outlying  value. 

Note  that  just  because  the  Tank  2A  response  is  an  outlier  does 

not  in  and  of  itself  mean  that  the  data  should  be  discarded  or  disregar¬ 
ded.  Rather,  the  investigator  needs  to  reexamine  the  records  for  this 

tank  to  determine  the  reason  for  the  atypical  response.  If  it  is  due 
to  clerical  error,  to  experimental  mishap,  to  outbreak  of  a  disease  un¬ 
related  to  the  toxicant,  etc  then  perhaps  the  Tank  2A  response  is  inap¬ 
propriate  and  should  be  disregarded.  If  it  represents  normal  biological 
variation  then  the  response  should  be  considered  with  the  others.  This 
is  a  matter  for  biological  judgement.  Outlier  detection  procedures  are 
merely  screening  devices  to  direct  attention  to  those  places  where  such 
biological  judgement  need  be  applied. 

DeFoe  compound  C  -  Fry  mortality 


Group 

1 

P  =  0; 

Group 

2 

P  =  0; 

Group 

3 

£  =  0.05; 

Group 

4 

£  =  0.024; 

Group 

5 

p  =  0.225;  expected  frequencies  less  than  5  in  both  tanks 

Group 

6 

p  =  1.00  (q  =  0); 

Thus  groups  1,  2,  3,  4,  6  correspond  to  the  Poisson  case  2.  (In  group  6 
we  interchange  the  roles  of  p  and  q).  Group  5  corresponds  to  case  3. 


For  group 


k-W1'2  2K-^?] 


For  all  groups  but  5  we  calculate 
5  we  carry  out  an  arc  sine  transformation 


X. 

3 

N. 

3 

■A. 

P 

N.f5 

N./N 

3 

(l  -  N./Nj 

f1/2  -  *?] 

Group 

JL 

Tank 

A 

0 

20 

0 

0 

.50 

0 

Tank 

B 

0 

20 

0 

0 

.50 

0 

Group 

_2 

Tank 

A 

0 

20 

0 

0 

.50 

0 

Tank 

B 

0 

20 

0 

0 

.50 

0 

Group 

_3 

Tank 

A 

0 

20 

.05 

1.0 

.50 

-2.828 

Tank 

B 

2 

20 

.05 

1.0 

.50 

1.172 

Group 

j4 

Tank 

A 

0 

21 

.024 

0.512 

0.512 

-2.049 

Tank 

B 

1 

20 

.024 

0.488 

0.488 

0.843 

X. 

3 

N. 

3 

/V 

q 

N.q 

J 

N./N 

3 

( 1  -  N./N 
\  J 

)1/2 

Group 

_6 

Tank 

A 

0 

20 

0 

0 

.50 

0 

Tank 

B 

0 

20 

0 

0 

.50 

0 

X. 

3 

N. 

3 

A 

P 

N.£ 

3 

N./N 

(’ 

.  -  N./N  |  2>^jT[arcsin»/pT  - 

1  >  jL  .  J 

arcsin/p 

Group 

_5 

Tank 

A 

4 

20 

0.225 

4.5 

0.50 

0.20 

-0.387 

Tank 

B 

5 

20 

0.225 

4.5 

0.50 

0.25 

0.372 

To  prepare  the  normal  probability  plot  we  order  the  standardized 
values  and  plot  the  i-th  smallest  against  the  plotting  position 
100  x  (i  -  0.5)/12  on  the  probability  scale.  These  values  are  indicated 
below. 


Ordered 

Value 

-2.828 

-2.049 

-0.387  0  0  0  0  0  0 

Plotting 

Position 

4.2 

12.5 

20.8  29.2  37.5  45.8  54.2  62.5  70.8 

10  11  12 


Ordered 

Value 


0.372 


0.843  1.172 


Plotting 

Position 


87.5  95.8 


The  normal  probability  plot  of  these  points  appears  in  Figure 
X.2.  We  note  two  points  well  below  the  standard  N(0,  1)  line.  These 
points  correspond  to  Tanks  3A,  4A.  Both  correspond  to  frequencies  of 
0,  where  the  normal  approximation  is  least  reasonable.  Furthermore, 
their  companion  tanks  do  not  show  up  as  outliers.  Thus  before  we  say 
there  is  an  outlier,  we  should  compare  the  proportions  in  the  two  repli¬ 
cate  tanks  to  see  if  there  is  any  statistical  evidence  of  differences. 

There  is  an  exact  test  for  the  equality  of  two  Poisson  means. 
Nelson  [24]  discusses  this  test  in  detail. 

In  order  to  use  this  test  for  detecting  outlier  tanks  we  need  to 
test  a  slightly  more  general  hypothesis.  Namely  consider  the  2x2  table 


Replicate 
A  B 


Live 

X 

Y 

Dead 

1 

N 

X  Y  X  Y 

If  —  <.l,  —  <.l  or  if  —  >.9,  —  >.9  then  we're  in  the  Poisson  case. 

na  nd  n.  n„ 


Now  A  =  N  P  ,  A  =  N  P  implies  that  if  P. 


PD  then  A  /A  =  N  /N 


We  thus  wish  to  test  the  hypothesis 


H 


=  P 


where  p  =  — 
nB 


Nelson's  test  rejects  H  at  level  a  if 
J  o 


X+l  -  p 


>  -  F(2X  +  2,  2Y;  1  -  a/2) 


or  if 


^  F<2X>  2Y  +  2;  a/2)  E  7T 


p  F(2Y  +  2,  2X;  1  -  a/2) 


where  F(v^,  V2 ;  y)  represents  the  upper  Y  point  of  the  F  c.d.f.  with 
\>i,  V2  degrees  of  freedom.  If  Y  =  0,  X>0  we  can  only  carry  out  the  one 


sided  test.  (We  have  just  one  sided  information  concerning  A  ). 

B 


H  :  A  /X  =  p 
o  A  B 


VB  Hj!  XA/XB  >p 


Nelson's  test  rejects  Hq  at  level  a  if  X  >  pF(2,  2X;  1  -  a) . 


We  now  apply  this  Poisson  test  to  the  outlier  detection  problem. 
Consider  groups  3,  4.  These  give  rise  to  the  two  extreme  points  on  the 
plot:-2.828,  -2.049.  Let  us  see  if  these  should  be  regarded  as  outliers. 


Replicate 


In  Group  3  we  have 


A 

B 

Dead 

0 

2 

2 

Live 

20 

18 

38 

20 

20 

40 

Thus  X  =  0 
Y  =  2 


X  ^  P  (A.)  E  P  (20p  ) 
o  A  o  A 


Y  ^  P  (A  )  E  P  (20p  ) 

O  D  O  D 


Since  X  =  0,  we  can  carry  out  only  a  one  sided  test.  Note  that  p  =  1. 


H  :  A  =  A. 
o  B  A 


Hr  \  5 


We  reject  HQ  at  level  a  if  Y  >  F(2,  2Y ;  1  -  a) .  In  our  example 
Y  =  2.  The  critical  value,  F(2,  4;  .95)  is  6.94,  which  exceeds  2.  There 


is  thus  no  statistical  evidence  of  differences  among  the  two  responses 
in  group  3.  Thus  Tank  3  A  is  not  an  outlier. 


We  now  carry  out  the  test  for  group  4.  The  situation  is  less 
extreme  than  that  in  group  3,  however  we  go  through  with  the  test  for 
illustrative  purposes. 


Replicate 


In  Group  4  we  have 

A 

B 

Dead 

0 

1 

1 

Live 

21 

19 

40 

21 

20 

41 

Thus  X=0  X  't-  P  (A . )  E  P  (21p  ) 

o  A  o  A 

Y  =  1  Y  ^  P  (A  )  E  P  (20p  ) 

O  D  O  D 


Since  X  =  0,  we  can  carry  out  only  a  one  sided  test. 

H  .  h.  -  S  =  20  =  o  052  =  o 

V  A.  N.  21  °'95  ~  P‘ 

A  A 

A 


H  :  ~  >  0.952 
1  AA 


From  our  previous  dLscussion  (interchanging  the  roles  of  X  and  Y) 

nr 

reject  if  Y  >  p  F(2,  2Y;  .95)  =  ^  F(2,  2Y;  .95) 

A 

=  . 952F(2 ,  2Y ;  .95) 

In  our  case  Y  =  1,  .952F(2,  2Y ;  .95)  =  ( . 952) (19.0)  =  18.088 

Thus  we  cannot  reject  H  .  There  is  no  statistical  evidence  of 
differences  in  response  rates  among  the  tanks. 

Thus  tank  4A  is  not  an  outlier. 
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Group  1  p  =  0.35;  all  expected  frequencies 
Group  2  p  =  0.35;  all  expected  frequencies 
Group  3  p  =  0.315;  all  expected  frequencies 
Group  4  p  =  0.39;  all  expected  frequencies 
Group  5  p  =  0.335;  all  expected  frequencies 
Group  6  p  =  0.38;  all  expected  frequencies 


greater  than  5. 
greater  than  5. 
greater  than  5. 
greater  than  5. 
greater  than  5. 
greater  than  5. 


We  apply  the  transformation  suggested  in  case  1.  Data  summaries  are 
given  below. 


Calculate 


(.  -  a 


-1/2  /X  -  N  p' 

^  ^  -  ]  for  each  tank  within  each  group 


Group  1 


N.£ 

1 


n./n 

i  i 


Tank 

A 

17 

50 

0.35 

17.5 

3.373 

0.25 

-0.171 

Tank 

B 

21 

50 

0.35 

17.5 

3.373 

0.25 

1.198 

Tank 

C 

12 

50 

0.35 

17.5 

3.373 

0.25 

-1.883 

Tank 

D 

20 

50 

0.35 

17.5 

3.373 

0.25 

0.856 

Group 

_2 

Tank 

A 

19 

50 

0.35 

17.5 

3.373 

0.25 

0.514 

Tank 

B 

14 

50 

0.35 

17.5 

3.373 

0.25 

-1.198 

Tank 

C 

16 

50 

0.35 

17.5 

3.373 

0.25 

-0.514 

Tank 

D 

21 

50 

0.35 

17.5 

3.373 

0.25 

1.198 

Group 

_3 

Tank 

A 

15 

50 

0.315 

15.75 

3.293 

0.25 

-0.263 

Tank 

B 

12 

50 

0.315 

15.75 

3.293 

0.25 

-1.315 

Tank 

C 

24 

50 

0.315 

15.75 

3.293 

0.25 

2.893 

Tank 

D 

12 

50 

0.315 

15.75 

3.293 

0.25 

-1.315 

Tank 

A 

20 

50 

0.39 

IS. 5 

3.449 

0.25 

0.167 

Tank 

B 

21 

50 

0.  39 

19.5 

3.449 

0.25 

0.502 

Tank 

C 

23 

50 

0.39 

19.5 

3.449 

0.25 

1.172 

Tank 

D 

14 

50 

0.39 

19.5 

3.449 

0.25 

-1.841 

Group 

_5 

Tank 

A 

18 

50 

0.  335 

16.75 

3.337 

0.25 

0.433 

Tank 

B 

19 

50 

0.335 

16.75 

3.337 

0.25 

0.779 

Tank 

C 

14 

50 

0.335 

16.75 

3.337 

0.25 

-0.952 

Tank 

D 

16 

50 

0.335 

16.75 

3.337 

0.25 

-0.260 

Tank  A 

14 

50 

0.38 

19.0 

3.432 

0.25 

-1.682 

Tank  B 

16 

50 

0.38 

19.0 

3.432 

0.25 

-1.009 

Tank  C 

25 

50 

0.38 

19.0 

3.432 

0.25 

2.019 

Tank  D 

21 

50 

0.38 

19.0 

3.432 

0.25 

0.673 

To  prepare  the  normal  probability  plot  we  order  the  standardized 
values  and  plot  the  i-t’n  smallest  against  the  plotting  position  100  x 
(1  -  0.5)/24  on  the  probability  scale.  These  values  are  indicated  below 


j^'  v 

Ordered 

Value 

■1.883 

-1.841 

-1.682 

-1.315 

-1.315 

-1.198 

-1.009 

-0.952 

i-'-N 

t 

Plotting 

Position 

2.1 

6,2 

10.4 

14.6 

18.7 

22.9 

27.1 

31.2 

l 

!■  ■  ' 

l'- 

i 

9 

10 

11 

12 

13 

14 

15 

16 

r  - 

Ordered 

Value 

■0.514 

-0.263 

-0.260 

-0.171 

0.167 

0.433 

0.502 

0.514 

tt 

Plotting 

Position 

jj.4 

39.6 

43.7 

47.9 

52.1 

56.2 

60.4 

64 . 6 
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F/G  6/20 


NL 


Ordered 

Value 


0.673  0.779  0.856  1.172  1.198  1.198  2.019  2.893 


Plotting 

Position 


68.7 


72.9  77.1  81.2  85.4  89.6  93.7  97.9 


The  normal  probability  plot  of  these  points  is  shown  in  Figure 
X.3.  Due  to  the  within  group  correlation,  the  effective  sample  size  is 
6(J  -  1)  =6(4-1)  =18  independent  observations.  Is  the  Group  3,  Tank 
C  response  an  outlier?  To  determine  the  extent  of  statistical  evidence 
that  the  apparent  outlier  did  not  occur  just  due  to  chance  we  calculate 
the  probability  that  the  maximum  absolute  value  of  18  independent  standa¬ 
rd  normal  random  variables  does  not  exceed  2.893.  Let  Z^,  Z^,  . . . ,Z^g 
be  18  independent  standard  normal  random  variables. 

P[  max  |Z.|  >  2.893]  =  1  -  P[all  |z.|  <  2.893]  =  1  -  (.9962)18 

-*-»  •  •  •  J-O  J  3 

=  0.066. 


This  is  of  borderline  statistical  significance.  We  can  thus  infer  that 
there  is  marginal  statistical  evidence  that  Group  3  contain  an  out¬ 
lying  tank. 

Note  that  from  the  scatter  plot  of  embryo  mortality  vs  treatment 
in  Figure  VI. 5.  It  is  clear  that  there  is  no  trend  in  the  data  and  that 
the  Tank  3C  response  does  not  stand  out  from  those  of  the  six  groups  as 
a  whole.  If  we  knew  that  there  was  no  trend  over  groups  we  could  constr¬ 
uct  a  more  powerful  test  by  pooling  all  the  tanks  and  calculating  a  co¬ 
mmon  p.  However  there  is  no  point  in  doing  this  since  we  see  from  the 
scatter  plot  that  Tank  3C  is  not  out  of  line  with  respect  to  the  pooled 
responses,  but  rather  just  with  those  in  Group  3.  The  reason  for  this, 
if  any,  might  be  pursued. 

Holcombe  and  Phipps  compound  D 

1=6,  J  =  4  (i.e.6  groups,  4  tanks  per  group) 

Fry  Mortality 

Group  1  p  =  0.06 

Group  2  p  =  0.08 

Group  3  p  =  0.08 

Group  4  p  =  0.13;  expected  frequencies  less  than  5  in  each  tank. 

Group  5  p  =  0.79;  all  expected  frequencies  greater  than  5. 

Group  6  p  =  1.00  (q  =  0). 

Thus  groups  1,  2,  3,  6  correspond  to  Poisson  case  2.  Group  4  corresponds 
to  case  3  (possibly  to  case  2).  Group  5  corresponds  to  case  1. 
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vf  ---  -  - 


'  -‘•Li 


Thus  for  groups  1,  2,  3, 
we  calculate 

(l  -  Nj/N)-1/2 

For  group  6  we  calculate 

(l  -  Nj /n)”1^2 

For  group  4  we  calculate 

(l  -  Nj /N)*1/2 

arcsln^pj 

For  group  5  we  calculate 

(1  -  Nj/»)-1/2| 

2 

2v^T”  ^arcsinv'pj  - 
X.  -  N.p 


y  vfojft 


Group  4 


Tank  A 

4 

25 

.13 

3.25 

.25 

Tank  B 

5 

25 

.13 

3.25 

.25 

Tank  C 

4 

25 

.13 

3.25 

.25 

Tank  D 

0 

25 

.13 

3.25 

.25 

xi 

S3 

/\ 

P 

V 

V* 

.16  .493 
.20  1.094 
.16  .493 
0  -4.259 


Group  5 

Tank  A 

23 

25 

.79 

19.75 

.25 

2.037 

1.842 

Tank  B 

21 

25 

.79 

19.75 

.25 

2.037 

0.709 

Tank  C 

16 

25 

.79 

19.75 

.25 

2.037 

-2.126 

Tank  D 

19 

25 

.79 

19.75 

.25 

2.037 

-  .425 

VXj 

nj 

$ 

V 

VN  ( 

i  -  y..)-1'2  2[A.-x. 

1 - 1 

1 

Group  6 

Tank  A 

0 

25 

0 

0 

.25 

0 

Tank  B 

0 

25 

0 

0 

.25 

0 

Tank  C 

0 

25 

0 

0 

.25 

0 

Tank  D 

0 

25 

0 

0 

.25 

0 

To  prepare  the  normal  probability  plot  we  order  the  standardized 
values  and  plot  the  i-th  smallest  against  the  plotting  position  100  x 
(i  -  0.5) /24  on  the  probability  scale.  These  values  are  indicated  below 


12  3  4 


5  6  7 


Ordered 

Value 


-4.259  -3.266  -2.126  -0.957  -0.957  -0.519 


-0.519 


10  11  12  13  14  15  16 


Ordered 

Value 

-0.425 

0 

0  0  0 

0 

0  0  0.438 

Plotting 

Position 

31.2 

35.4 

39.6  43.7  47.9 

52.1 

56.2  60.4  64.6 

17 

18 

19  20 

21 

22  23  24 

Ordered 

Value 

0.438 

0.493 

0.493  0.709 

0.734 

1.094  1.842  1.898 

Plotting 

Position 

68.7 

72.9 

77.1  81.2 

85.4 

89.6  93.7  97.9 

The  normal  probability  plot  appears  in  Figure  X.4.  It  appears 
that  the  lowest  3  points  are  well  below  the  N(0,  1)  line.  These  three 
points  correspond  to  Tanks  4D,  3B,  5C.  Tanks  4D,  3B  each  have  observed 
frequencies  of  0.  This  is  the  region  where  the  normal  approximation  is 
the  poorest.  Thus  before  we  say  that  there  are  any  outliers,  we  should 
compare  the  tanks  within  the  treatment  groups  using  a  more  appropriate 
exact  test. 

First  let's  look  at  tank  3B. 


Tank  3  B  is  the  suspected  outlier.  Let's  compare  its  results  to  those 
in  the  other  3  tanks. 


Replicate 


We  can  carry  out  an  exact  test  by  means  of  the  Fisher  -  Irwin  test.  (See 
Lehmann  [25],  section  4.5,  Lieberman  and  Owen  [26]). 


To  carry  out  the  Fisher  -  Irwin  test  we  adopt  the  following  nota- 
tional  identifications  for  the  table. 


Grp  1  Grp  2 


Spec. 

X 

k 

Or din. 

K  -  k 

n 

N  -  n 

N 

Where 

k  _<  N  -  k,  n£N-n,  k  _<  n. 
i.e.  k  is  the  smallest  marginal  entry 

n  is  the  smallest  marginal  entry  in  the  other  margin. 

X  is  the  cell  entry  in  the  cell  corresponding  to  the  (n,  k) 
marginal  categories. 

In  our  example 
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x  »  0 
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75 

100  =  N 

If 
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("special”) 


Thus  k  =  8,  n  =  25,  N  =  100,  X  *  x  =  0.  We  enter  the  Lieberman  and 
Owen  hypergeometric  distribution  tables  at  these  parameters.  We  obtain 
P(X  <_  0)  =  0.091.  Thus  a  two  sided  probability  is  0.182.  This  is  quite 
marginal,  at  most. 

Now  group  B  was  not  chosen  a  priori,  but  rather  as  the  most  extreme 
of  the  4  responses.  Thus  to  get  a  feeling  for  how  extreme  this  behavior 
is  we  carry  out  the  following  approximate  calculation.  P(most  extreme  of 
4  independent  responses  is  more  significant  than  0.182  level)  =  1  -  P 
(all  4  responses  less  significant  than  0.182  level)  =  1  -  (1  -  0.182)^ 

=  0.55.  We  thus  conclude  that  there  is  no  statistical  evidence  that  the 
response  rate  in  Tank  3B  differs  significantly  from  the  responses  rates 
in  the  other  tanks  in  that  group.  We  conclude  that  the  extreme  behav¬ 
ior  of  the  standardized  value  is  due  to  the  inapplicability  of  the  nor¬ 
malizing  square  root  transformation  when  X,  =  0. 


We  now  consider  the  responses  in  Group  4. 
Group  4 
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Tank  4D  is  the  suspected  outlier.  Let's  compare  its  results  to  those 
in  the  other  3  tanks. 
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Thus  here  k  ■  13,  n  =  25,  N  =  100,  X  =  x  =  0. 

Entering  the  Lieberman  and  Owen  tables  we  find  that 

P(X  £  0)  «  0.018  this  is  a  one  sided  probability 

Thus  the  observed  two  sided  significance  level  =  2(0.018)  =  0.036 

Now  tank  D  was  not  chosen  a  priori.  Taking  selection  into  account  we  have 
P  (most  extreme  of  4  tanks  more  significant  than  0.036  level)  = 

1  -  P  (all  4  tanks  less  significant  than  0.036  level) 

*  1  -  (1  -  .036) 4  =  0.14. 

There  is  thus  at  most  a  marginal  suggestion  that  the  response  rate  in 
Tank  4D  differs  significantly  from  the  response  rates  in  the  other  tanks 
in  that  group.  The  very  extreme  appearance  of  the  standardized  value 
on  the  normal  probability  plot  is  again  due  to  the  inapplicability  of 
the  normalizing  transformation  when  X^  =  0. 

We  now  consider  the  responses  in  Group  5. 
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Tank  5C  is  the  suspected  outlier.  Let  s  compare  its  results  to  those  in 
the  other  three  tanks. 
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79 
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100  =  N 


In  this  case  the  "special"  category  is  "live".  Thus  k  =  21,  n  =  25, 
N  -  100,  X  *  x  *  9.  Entering  the  Lieberman  and  Owen  tables  we  find 
that 


P(X  >  9)  =  1  -  P(X  <  8)  =  1  -  -9638  =  .0362. 


Thus  the  two  sided  significance  probability  is  2 (.0362)  =  .0724.  This 
is  at  best  marginal.  Now  tank  D  was  not  chosen  a  priori.  Taking 
selection  into  account,  we  have 


P  (most  extreme  of  4  tanks  most  significant  than  .0724  level)  = 
1  -  (1  -  . 0724) 4  =  .260 


Thus  tank  D  Is  not  significantly  different  than  the  others.  Since 
the  expected  frequencies  are  fairly  large  in  this  example  we  can 
also  carry  out  an  asymptotic  test. 


T 


Expected  frequencies  are  in  parentheses. 

y2  _  (0  ~  E)2  =  (16  -  19. 75)2  (63  -  59. 25)2  (9  -  5.25)2  (12  -  15. 75)2 

X  ~  E  19.75  59.25  5.25  15.75 

Thus  x2  =  4.53 

2  2 
Under  the  hypothesis  of  homogeneity,  X  is  distributed  as  X-^* 

2 

Thus  x  is  significant  at  the  0.033  level,  in  close  agreement  with  the 
results  based  on  the  Fisher  -  Irwin  test,  namely  lack  of  statistical 
evidence  of  differences  when  selection  is  accounted  for. 

Since  the  normalizing  transformation  is  appropriate  for  the  range 
of  responses  in  Group  5,  we  can  also  regard  the  normalized  value  from 
Tank  C,  -2.126,  as  the  minimum  of  6  x  (4  -  1)  -  2  =  16  standard  normal 
deviates  (we  disregard  those  responses  corresponding  to  Tanks  3B ,  4D) . 

The  probability  that  a  standard  normal  deviate  is  less  than  -2.126  is 
0.017.  The  probability  that  the  minimum  of  22  independent  standard 
normal  deviates  is  less  than  -2.126  is  thus  1  -  (1  -  0.017) 16  =  0.25. 

Again  there  is  no  statistical  evidence  that  this  value  is  an  outlier. 


Jarvinen  compound  B 

I  =  6,  J  =  2  (i.e.  6  groups,  2  tank  per  group) 

Embryo  Mortality 

The  normal  probability  plot  of  standardized  values  (based  on  the 
case  1  transformation)  is  shown  in  Figure  X.5.  The  plot  appears  perfect¬ 
ly  symmetric  about  0  since  J  =  2.  The  effective  sample  size  is  6(J  -  1) 

=  6  independent  observations.  The  reference  line  in  the  plot  is  the 
standard  normal  distribution  function. 

The  behavior  of  the  plot  suggests  the  presence  of  extrabinomial 
variation  (i.e.  random  tank  to  tank  variation  within  groups)  in  the  data 
rather  than  outliers.  This  is  seen  by  the  fact  that  the  points  lie  along 
a  curve  with  steeper  slope  than  that  of  the  standard  normal  distribution 
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function.  The  extreme  points  are  not  outliers  since  they  lie  directly  on 
the  curve  determined  by  the  other  values.  Thus  the  conjecture  made  in 
section  VIII  concerning  the  presence  of  outliers  is  not  borne  out.  Note 
that  this  behavior  is  directly  opposite  to  that  observed  in  Figure  X.l. 

If  we  draw  a  line  through  the  plotted  points  in  Figure  X.5  the 
estimated  standard  deviation  (corresponding  to  the  difference  between 
the  84th  and  50th  percentiles)  is  about  1.7.  Thus  the  estimated  variance 
is  (1.7) 2  =  2.9.  This  values  is  very  close  to  the  factor  K  =  3.07  that 
we  utilized  in  section  IX  to  adjust  these  data. 
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XI.  TESTING  FOR  CONCENTRATION  RELATED  EFFECTS 


After  we  have  completed  preliminary  graphical  displays,  tests 
for  tank  to  tank  heterogeneity,  and  outlier  detection  procedures 
we  are  ready  to  proceed  to  the  main  portion  of  the  data  analysis. 
This  involves  comparing  responses  across  treatment  groups  to  ar¬ 
rive  at  an  inference  about  what  constitutes  an  "acceptable" 
concentration.  If  no  tank  to  tank  heterogeneity  is  evident  in 
the  data  then  the  original  data  may  be  pooled  across  tanks  within 
groups  and  subsequent  analyses  carried  out  on  a  per  fish  basis 
or  alternatively  the  data  can  first  be  adjusted  to  reflect  the 
increased  variation  and  the  adjusted  "data"  can  be  pooled  across 
tanks  and  analyzed  on  a  per  fish  basis.  As  remarked  earlier,  we 
prefer  the  latter  approach. 


Before  considering  statistical  procedures  to  determine  accept¬ 
able  concentrations,  we  must  first  define  what  is  considered  to 
be  acceptable.  According  to  the  guidelines  for  early  life  stage 
tests,  [8],  "...A  lower  chronic  endpoint  is  the  highest  tested 
concentration. . .which  did  not  cause  the  occurrence  (which  was 
statistically  significantly  different  from  the  control  at  the 
95%  level)  of  any  specified  adverse  effect  and  below  which  no 
tested  concentration  caused  such  an  occurrence. . .An  upper  chronic 
endpoint  is  the  lowest  tested  concentration. . .which  caused  the 
occurrence  (which  was  statistically  significantly  different  from 
the  control  at  the  95%  level)  of  any  specified  adverse  effect 
and  above  which  all  tested  concentrations  caused  such  an  occure¬ 
nce".  We  are  thus  interested  in  determining  which  concentrations 
yield  (statistically)  significantly  different  results  than  the 
control  group.  In  a  later  section  we  will  present  an  alternative 
notion  of  acceptable  concentration. 


Opinion :  For  the  purpose  of  testing  hypotheses  concerning  heter¬ 
ogeneity  of  response  rates  across  groups  or  of  constructing  con¬ 
fidence  intervals  to  compare  treatment  group  and  control  group 
response  rates,  unless  there  is  relatively  strong  statistical 
evidence  of  heterogeneity  among  tanks  within  groups  (e.g.  obser¬ 
ved  significance  level  less  than  0.05  or  0.10)  then  act  as  if 
there  is  not  heterogeneity  of  response  from  tank  to  tank  within 
groups.  Base  subsequent  tests  and  confidence  intervals  on  the 
original  (i.e.  unadjusted)  data,  pooled  across  tanks  within 
groups. 


This  suggestion  reflects  a  conservative  viewpoint  with  res¬ 
pect  to  the  conclusions  drawn  from  such  subsequent  analyses. 
Namely,  suppose  that  such  tank  to  tank  variation  within  treat¬ 
ment  groups  exists  but  we  do  not  detect  it.  Then  we  proceed  as 


if  none  exists.  Thus  the  "true  variability  of  the  test  statis¬ 
tics  that  we  use  will  exceed  the  assumed  variability.  Thus  if  we 
carry  out  a  test  at  nominal  level  a  =  0.05,  say,  the  "true"  alpha 
level  will  in  fact  be  greater  than  the  nominal. 


Inflating  the  actual  a  -  level  over  the  nominal  level  makes 
the  test  more  prone  to  reject  the  hypothesis  of  equality  of  treat 
ment  and  control  groups  than  an  actual  a  -  level  test.  Thus  if 
we  err.  it  will  be  on  the  side  of  declaring  a  treatment  group 


significantly  different  from  the  control  group  when  it  in  fact 


is  not.  This  is  conservative. 


The  chi  square  test  is  an  overall  test  of  this  hypothesis  (i.e. 
a  shotgun  test).  It  is  based  on  the  following  statistic: 

Let  (X^j ,  N^j)  denote  the  number  of  responses  and  the  number 
of  fish,  respectively,  in  the  j-th  tank  within  the  i-th  treatment 
group 


i  1,  ...,  X 5  j  1,  * • • , 


I  J 


I  J 


Ni+  ^  Nij »  Xi+  2  Xij  *  N++  Z  ^  Nij ’  X-++  "  £  £  Xij  ‘  Then 


i=l,j=l 


j=l,j=l 


p, ,  =  — —  is  the  estimate  of  the  common  value  of  p  under  H  . 
++  N. ,  o 


x24 

i=i 

2  2  2 

is  the  x  test  of  H0.  Under  H0,  x  %  Xj.^-  Since  the  chi  square 
test  is  sensitive  to  all  kinds  of  departures  from  H0,  it  is  not 
tailor  made  to  be  sensitive  to  ordered  alternatives,  of  the  type 
most  commonly  encountered  in  toxicology.  We  will  discuss  this 
further  later. 

The  chi  square  test  is  easy  to  carry  out  from  a  computational 
standpoint  since  many  standard  programs  are  available.  For  ex¬ 
ample  the  procedure  PROC  FREQ  in  the  SAS  statistical  computing 
system  [12]  can  be  used  to  carry  out  this  test.  The  program 
BMDP1F  in  the  BMDP  statistical  computing  system  [9]  can  also 
be  used  for  this  purpose.  Figure  XI. 1  illustrates  output  from 
SAS  PROC  FREQ  to  test  for  homogeneity  of  response  rates  across 
groups  for  the  fry  mortality  data  from  the  Holcombe  and  Phipps 
test  on  compound  D.  This  test  is  based  on  data  pooled  across 
tanks  within  groups.  Of  course,  the  test  rejects  H0  very  strong¬ 
ly,  as  it  should  based  on  the  appearance  of  the  preliminary  sca- 
tterplot. 


£ 
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We  have  also  incorporated  this  test  into  our  EXAX2  computer 
program  [  14  ] .  We  pool  responses  across  tanks  within  groups  and 
compute  the  chi  square  test.  If  expected  frequencies  within 
each  cell  exceed  the  cutoff,  we  evaluate  the  significance  of  chi 
square  based  on  its  asymptotic  distribution  under  Hq. 

If  any  expected  frequency  is  less  than  the  cutoff  we  evaluate  the 
exact  small  sample  distribution  of  chi  square,  conditional  on  the 
margins,  by  enumeration  as  discussed  in  the  section  on  the  exact 
chi  square  program. 


We  illustrate  this  feature  of  EXAX2  with  the  DeFoe  compound 
C  and  with  the  Holcombe  and  Phipps  compound  D  data.  We  tested 
for  heterogeneity  of  response  rates  across  groups  for  the  fry 
mortality  and  the  embryo  mortality  data.  The  results  of  these 
tests  are  shown  in  Figures  XI. 2  -  XI. 5. 

Figure  XI. 2  illustrates  the  chi  square  test  of  homogeneity 
across  treatment  groups  for  the  embryo  mortality  data  in  the 
DeFoe  compound  C  experiment.  It  will  be  recalled  that  no  tank 
to  tank  heterogeneity  within  treatment  groups  was  found  by  the 
preliminary  test,  and  so  the  data  have  been  pooled  across  the 
tanks  within  treatment  groups.  The  first  matrix  displays  the 
observed  2x6  table.  The  second  matrix  displays  the  expected 
(under  Hq)  frequencies.  Since  each  expected  frequency  exceeds 
5  (by  a  great  deal  in  this  example)  the  test  is  based  on 
asymptotic  theory. 

NOTE  THAT  SINCE  THIS  IS  A  PRELIMINARY  TEST  WE  SHOULD  BE  VERY 
LIBERAL  IN  DECIDING  WHEN  TO  REJECT  HQ  AND  TO  GO  ON  TO  MORE  DE¬ 
TAILED  COMPARISONS.  THUS  A  LARGE  a-VALUE  e.g.  a  =  .20  or  a  =  .25 
SHOULD  BE  USED.  THIS  ENHANCES  THE  SENSITIVITY  OF  THE  TEST  TO 
DETECT  MODERATE  DEPARTURES  FROM  Ha. 

We  see  from  the  bottom  of  Figure  XI. 2  that  the  observed  signi 
ficance  level  is  a  =  0.31.  Thus  even  by  our  liberal  yardstick  we 
see  no  statistical  evidence  of  group  to  group  differences  in 
embryo  mortality  in  the  DeFoe  data.  This  agrees  with  the  appear¬ 
ance  of  the  preliminary  scatter  plots. 

The  same  test  was  carried  out  for  the  DeFoe  compound  C 
fry  mortality  data.  The  results  are  given  in  Figure  XI. 3. 

Again,  all  the  expected  frequencies  exceed  5.0  and  so  the 
asymptotic  theory  is  used.  This  time  the  chi  square  statistic 
is  highly  significant.  Chi  square  =  182.79  with  5  d.f.  Thus 
there  is  strong  statistical  evidence  of  group  to  group  response 
rate  differences  in  fry  mortality.  This  again  agrees  with  the 
appearance  of  the  preliminary  scatter  plot. 


Figures  XI. 4,  XI. 5  contain  the  results  of  the  chi  square  tests 
of  homogeneity  across  groups  for  the  Holcombe  and  Phipps  compound 
D  embryo  mortality  and  fry  mortality  data  respectively.  Again  the 
data  have  been  pooled  across  tanks  within  groups. 

In  both  cases  the  expected  frequencies  exceed  5  and  so  asymp¬ 
totic  theory  is  used.  We  see  for  the  embryo  mortality  data  the 
test  is  nonsignificant  even  at  the  liberalized  a  =  .20.  For  the 
fry  mortality  data  the  test  is  very  highly  significant  (chi  square 
=  389.68  with  5  d.f.).  Thus  again  there  is  no  statistical  evidence 
of  group  to  group  differences  with  respect  to  embryo  mortality 
while  there  is  strong  statistical  evidence  of  group  to  group 
differences  with  respect  to  fry  mortality.  This  is  in  good 
agreement  with  the  appearances  of  the  preliminary  scatterplots . 

Figure  XI. 6,  XI. 7  contain  the  results  of  the  chi  square 
tests  of  homogeneity  across  groups  for  the  Jarvinen  compound  B 
embryo  mortality  and  fry  mortality  data  respectively.  The 
data  have  been  pooled  across  tanks  within  groups.  In  both 
cases  all  the  expected  frequencies  exceed  5  and  so  asymptotic 
theory  is  used. 

For  the  fry  mortality  data  the  test  is  very  highly  signifi¬ 
cant,  as  was  the  case  with  the  fry  mortality  data  from  the  other 
experiments  considered.  It  is  quite  clear  that  the  last  two 
treatment  groups  have  substantially  higher  response  rates  than 
the  first  two  groups. 

In  contrast  to  the  cases  for  the  DeFoe  and  Holcombe  and 
Phipps  data  sets,  there  is  some  statistical  evidence  of  group  to 
group  differences  in  embryo  mortality  rates. 

We  also  saw  strong  statistical  evidence  in  Sections  VII  and  X 
of  heterogeneity  in  response  rates  among  tanks  within  groups.  In 

Section  IX  we  calculated  an  adjustment  factor  of  K  =  3.071  for 
these  responses,  to  account  for  the  tank  to  tank  heterogeneity. 

The  effect  of  this  adjustment  on  chi  square  is  to  adjust  it  down¬ 
ward  by  the  factor  K.  Thus  with  respected  to  the  adjusted/s"re- 
sponses" ,  the  observed  chi  square  value  becomes  10.71426/  K  = 
10.71426/3.071  =  3.489.  The  probability  that  a  chi  square  rand¬ 
om  variate  with  5  d.f.  exceeds  3.489  is  0.625.  Thus  the  tank  to 
tank  heterogeneity  within  groups  accounts  for  the  significant  chi 
square  across  groups.  Thus  again  there  is  no  statistical  evidence 
of  variation  in  embryo  mortality  rate  across  groups. 

One  Sided  Tests  of  Homogeneity  Across  Treatment  Groups 


The  shotgun  chi  square  test,  although  the  most  commonly  used 
test  of  homogeneity  of  response  rates,  is  not  the  most  appropriate 
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test  for  application  to  toxicity  data.  The  chi  square  test  is 
an  overall  test  which  is  not  designed  to  be  particularly  sensi¬ 
tive  to  the  one  sided,  monotone  alternatives  characteristic  of 
dose  response  tests.  More  specialized  tests  have  been  designed 
to  be  more  powerful  against  alternatives  of  this  type. 

Several  tests  of  response  rate  homogeneity  against  ordered 
alternatives  are  discussed  in  the  literature.  Snedecor  and  Cochran 
[28]  section  9.11  and  Steel  and  Torrie  [29],  section  22.10  extract 
a  single  degree  of  freedom  from  the  overall  chi  square  test  to 
test  for  linear  regression  in  2  x  K  tables  where  the  columns  fall 
in  a  natural  order  Scores,  Z j  ,  are  assigned  (arbitrarily)  to  the 
columns  to  treat  them  as  values  on  a  continuous  scale  of  measure¬ 
ment.  The  weighted  linear  regression  coefficient  of  mortality 
probability  on  score  Zj  is  calculated  and  tested  for  significance. 
The  major  drawback  of  this  method  is  the  arbitrariness  of  the 
scores.  See  [28,  29]  for  details. 

An  alternative  approach  to  the  construction  of  one  sided 
tests  is  by  means  of  measures  of  association  for  ordered  contin¬ 
gency  tables.  Such  measures  can  be  thought  of  as  analogs  for 
qualitative  responses  to  correlation  coefficients  for  quantita¬ 
tive  responses.  Goodman  and  Kruskal  [  30,  31  ]  have  derived  and 
reported  on  a  number  of  measures.  Several  commonly  used  measures 
of  association  are  Kendall's  tau  b,  Stuart's  tau  c,  Goodman  and 
Kruskal 's  gamma,  just  to  name  a  few.  For  a  given  table  each  of 
these  measures  yields  different  numerical  values  and  so  it  is 
not  clear  how  to  ascribe  physical  meaning  to  these  values.  How¬ 
ever  for  each  of  the  measures  a  value  of  zero  means  no  monotone 
association  between  categories  and  positive  or  negative  values 
mean  positive  or  negative  associations  respectively.  Thus  a 
test  of  homogeneity  of  treatment  group  response  rates  that  is 
sensitive  to  monotone,  one  sided  alternatives  can  be  constructed 
by  testing  the  null  hypothesis  that  these  measures  are  zero  again¬ 
st  a  one  sided  alternative.  Brown  and  Benedetti  [  32  ]  have  cal¬ 
culated  improved  standard  error  estimates  for  the  various  measures 
that  are  appropriate  studentizing  factors  to  test  the  null  hypo¬ 
thesis  that  these  measures  are  zero.  They  show  empirically  that 
their  new  standard  errors  provide  better  approximations  in  small 
and  moderate  samples  than  do  the  older  standard  error  estimates 
reported  by  Goodman  and  Kruskal  [  31  ] .  Furthermore  they  show 
that  a  number  of  measures,  each  having  different  numerical  values, 
result  in  identical  "t  ratios"  when  normalized  by  their  respect¬ 
ive  Brown  and  Benedetti  standard  error  estimates.  This  is  de¬ 
sirable  because  we  need  consider  just  one  "t  ratio"  rather  than 
five.  Proctor  [  33 ]  shows  that  tests  based  on  measures  of  assoc¬ 
iation  are  in  fact  much  more  powerful  against  one  sided,  monotone 
alternatives  than  is  the  shotgun  chi  square  test,  as  would  be 
intuitively  expected. 


* 
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Agresti  and  Wackerly  [  15  ]  also  discuss  one  sided  tests  of 
homogeneity  based  on  measures  of  association.  They  discuss 
Kendall's  ^  in  some  detail.  They  illustrate  an  instance  of  the 
increased  sensitivity  of  such  measure  of  association  tests  for 
detecting  ordered  departures  from  homogeneity  with  the  following 
artificial  example.  Consider  the  3  by  3  contingency  table  with 
ordered  categories: 


There  is  clearly  a  positive  trend  in  the  table  however  the  Fisher- 
Irwin  test  (exact)  shows  significance  level  a  =  .514.  This  test 
would  thus  miss  the  trend.  However  Kendall's  'W  test  (exact)  is 
significant  at  a  =  .053  and  so  would  detect  the  trend. 

Agresti  and  Wackerly  also  comment  that  the  asymptotic  normal 
approximation  to  the  distribution  of  the  sample  estimate  of 
may  be  quite  poor  for  small  sample  sizes.  They  report  that  the 
observed  significance  level  can  be  substantially  greater  than 
the  nominal  for  small  sample  sizes;  i.e.  we  reject  Hp  when  it  is 
correct  far  more  than  the  nominal  proportion  of  times.  This  is  at 
least-  in  part  due  to  the  maximum  likelihood  estimate  of  standard 
error  of  ^  having  a  negative  bias.  Based  on  Agresti  and  Wicker ly 

example,  asymptotic  normal  distribution  theory  would  be  suspect 
at  least  for  N  below  50.  Agresti  and  Wackerly  suggest  that  an 
alternative,  exact  conditional  test  against  ordered  alternatives 
can  be  used,  based  on  measures  of  association  and  enumeration  of 
tables,  when  the  sample  sizes  are  too  small  to  apply  asymptotic 
normal  theory. 

The  applicability  of  asymptotic  distribution  theory  for  the 
sample  sizes  and  magnitudes  of  response  proportions  encountered 
in  fish  toxicity  tests  is  a  matter  for  detailed  future  study, 
probably  by  simulation.  This  is  too  involved  for  us  to  consider 
here.  However  as  we  use  this  test  only  on  pooled  data  (original 
or  adjusted)  across  tanks  within  groups,  the  sample  sizes  would 
be  expected  to  be  reasonably  large  (N  in  excess  of  200)  and  so 
we  utilize  asymptotic  theory  for  the  remainder  of  this  section. 

It  should  be  noted  that  ordinal  measures  of  association  as¬ 
sume  that  as  one  variable  (e.g.  concentration)  increases  the 
other  variable  (e.g.  percent  mortality  or  percent  abnormality) 
either  increases  monotonically  or  decreases  mono t on ic ally . 


Nonmonotone  relations  (e.g.  first  increasing  and  then  decreasing) 
can  well  result  in  small  or  even  zero  values  of  the  measures. 

This  is  analogous  to  properties  of  correlation  coefficients. 


Goodman  and  Kruskal  [  30  ]  define  and  discuss  the  properties 
of  a  number  of  measures  of  association  for  cross  classifications. 
They  include  a  section  on  measures  for  ordered  categories  (i.e. 
ordinal  data).  They  propose  a  measure,  Y»  which  is  defined  as 
follows : 

Suppose  two  individuals  are  drawn  at  random  from  a  population 
described  jointly  by  two  discrete,  ordered  categories. 

Category  1:  i  =  1,  ...»  I 
Category  2:  j  =  1,  ...,  J 

In  our  fish  toxicity  examples  1=2  (e.g.  live,  dead),  J  =  C 
(number  of  groups) 

Let  (i,  j),  (i' ,  j ')  denote  the  (random)  indices  of  these  two 
individuals  within  the  two  categories.  If  there  is  an  ordered  cor 
respondence  between  categories,  we  should  see  the  same  (or  oppo¬ 
site)  orderings  of  each  of  the  categories ,  depending  on  direction 
of  association. 

Let  II  =  P[(i>i'  and  j>j  ')  or  (i<i'  and  j<j')].  =  P{same} 
s 

nd  =  P[(i>i '  and  j<j ')  or  (i<i*  and  j>J')].  =  P{diff} 
nt  =  P [i  =  i'  or  j  =  j '].  =  P{tie} 

To  avoid  ambiguity  they  condition  on  the  absence  of  ties. 

The  conditional  probability  of  like  orders  given  no  ties  is  JIg/ 

(1  -  n^).  The  conditional  probability  of  unlike  orders  given  no 

ties  is  nd/(l  -  n  ).  The  difference  of  these  two  probabilities 

is  defined  as  y  .  Namely 

Goodman  and  Kruskal 's  Gamma 


Y  = 


n 

s 

n 


s 


In  the  situation  when  the  two  categories  are  independent  ng  = 

H,  and  so  y  =  0.  However  the  converse  is  not  necessarily  true 
d 

(except  in  the  2x2  case) . 


The  Kendall's  ^  and  Stuart's  ^  measures  are  related  to  y  . 
Let  m  =  min(I,  J) .  Then 


n  -  it 

s  d 
(m  -  l)/m 


This  modification  is  made  so  that  can  nearly  attain  the 

absolute  value  1  for  nonsquare  tables  when  the  entire  population 
lies  on  a  longest  diagonal  of  the  table. 

Kendall's  ^  is  also  related  to  y.  Namely,  let  p  denote 
a  cell  frequency, 


’i-5 1  pu •  p.j  -2  pir  Then 


n  -  n. 

s  d 

T  =  irnir 


i-ip. 

i=l 


n  -  n . 

s  d _ 

J  2  1  2 
j=l  ‘J  i=l  j=l  *0 


^  corrects  for  pairs  of  observations  tied  with  respect  to  at 
least  one  of  the  categorizations  and  ranges  between  -1  and  +1. 

Somer's  d  (asymmetric  measure) 


i  -  n  +  x 

t  o 


Where  Y  =  probability  of  tie  in  row  only 

o 

X  =  probability  of  tie  in  column  only 

For  all  of  these  measures  a  zero  value  indicates  a  complete 
lack  of  a  monotone  relationship  between  the  two  variables 
(no  association) .  A  value  of  +1  indicate  a  perfect  monotone  in¬ 
creasing  relationship  (perfect  agreement)  and  a  -  1  indicates 
a  perfect  monotone  decreasing  relationship  (perfect  disagreement) . 
It  should  be  noted  that  lack  of  a  monotone  relationship  is  not 
the  same  as  statistical  independence.  These  measures  will  equal 
zero  when  there  is  dependence  of  a  complicated  form.  However, 
when  the  variables  are  independent,  the  measures  will  equal  zero. 
Kendall’s  differs  from  the  others  in  that  it  can  reach  a  value 

of  +  1  only  for  square  tables,  otherwise  its  maximum  is  lower. 

Stuart's  is  an  adjustment  of  that  can  attain  value  +  1  for 
non  square  tables. 

There  is  much  discussion  in  the  literature  about  which  measur¬ 
es  most  realistically  portray  strengths  of  monotone  relations. 

In  general  it  is  difficult  to  interpret  the  magnitudes  of  these 
measures  in  any  physically  meaningful  fashion.  However  we  will 
be  using  these  measures  of  association  only  for  tests  of  signif¬ 
icance  to  detect  departures  from  0.  For  this  application  the 
situation  simplifies  considerably  since  Brown  and  Benedetti, 
page  311,  show  that  basing  a  test  of  significance  on  y,  ^  , 
d  |R,  dR|c  all  lead  to  exactly  the  same  test  statistic,  sample 

for  sample.  Thus  we  do  not  need  to  be  concerned  with  differences 
among  the  values  of  the  measures.  That  is.  Brown  and  Benedetti, 
[125]  have  derived  new  estimates  ASEC  of  the  asymptotic  standard 
errors  of  the  measures  of  association  that  are  better  than  those 
given  previously  in  the  literature  for  testing  the  null  hypothesis 
that  the  measure  is  zero.  They  report  one  set  of  standard  errors 
to  use  for  testing  purposes  and  another  set  of  standard  errors  to 
use  for  constructing  confidence  intervals.  They  show  that 

y  _  -  aR/C  _  ac/R  .  _ 


which  means  that  the  five  measures  all  give  the  same  test  of  the 
null  hypothesis  of  no  (monotone)  association. 

Brown  and  Benedetti  report  a  simulation  study  of  the  use  of 
these  T  ratios  to  test  the  null  hypothesis.  They  compared  the 
T-values  to  the  percentage  points  of  the  standard  normal  distri¬ 
bution.  They  concluded  that 


1.  The  ASEq’s  give  empirical  type  I  error  rates  closer  to  the 
nominal  significance  level  and  more  consistent  for  different 
patterns  of  non-association  than  do  previously  reported  stan¬ 
dard  error  estimates. 

2.  For  N  >  100  the  T  -  value  can  safely  be  compared  to  the  per¬ 
centile  points  of  the  standard  normal  distribution. 

3.  For  N  -  50  the  distribution  of  the  T  -  value  seems  to  have 
heavier  than  normal  tails ,  and  they  recommend  comparing  it 
to  Student's  t  with  approximate  degrees  of  freedom  (ADF)  = 
0.4N. 

Their  T  -  ratios  for  testing  the  null  hypothesis  of  nonasso¬ 
ciation  (i.e.  monotone)  and  (asymptotic)  standard  errors  approp¬ 
riate  for  constructing  confidence  intervals  on  the  measures  of 
association  are  implemented  in  the  BMDP  [27  ]  program  BMDP1F, 
measures  of  association  for  two  way  frequency  tables.  (Note  that 
BMDP IF  was  extensively  rewritten  and  reissued  in  August,  1976. 

Thus  only  versions  of  this  program  dated  after  August,  1976  are 
based  on  the  most  up  to  date  theory) .  We  will  illustrate  the  use 
of  this  program  in  this  section,  with  both  artificial  and  real 
data. 

Proctor  [33  ]  has  discussed  the  relative  efficiencies  of  tests 
of  association  for  ordered  two  way  contingency  tables  and  has  com¬ 
pared  these  efficiencies  with  that  of  the  chi  square  test.  He 
reports  that  in  most  cases  of  ordered  alternatives,  the  efficien¬ 
cies  of  tests  based  on  the  measures  of  association  are  much  great¬ 
er  than  that  of  tests  based  on  chi  square.  For  one  example  of  a 
6x6  ordered  contingency  table  constructed  from  an  underlying 
bivariate  normal  distribution  with  correlation  p  =  0.80,  the 
efficiencies  of  the  tests  of  association  based  on  measures  of 
association  relative  to  the  chi  square  test  were  about  3.4.  This 
means  that  for  the  chi  square  test  to  attain  the  same  power  again¬ 
st  this  alternative  as  a  test  based  on  the  measures  of  associa¬ 
tion,  it  would  need  to  be  based  on  more  than  three  times  as  many 
observations.  In  efficiency  calculations  based  on  other  assump¬ 
tions  about  ie  alternatives,  the  chi  square  procedure  was  con¬ 
sistently  generally  very  much  less  efficient  than  test  procedures 
based  on  measures  of  association. 
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To  get  some  further  feeling  for  the  sensitivities  to  ordered 
alternatives  of  tests  based  on  measures  of  association  as  compa¬ 
red  with  the  chi  square  test  we  constructed  several  artificial 
sets  of  data  having  varying  degrees  of  monotone  trend  in  response 
probabilities.  We  tested  the  null  hypothesis  of  no  association 
between  mortality  level  and  treatment  group  using  the  one  sided 
tests  based  on  ordinal  measures  of  association  and  the  shot  gun 
chi  square  test.  Both  of  these  tests  can  be  carried  out  using 
the  BMDP  program  [27]  BMDP1F  (versions  subsequent  to  August, 
1976).  We  should  note  that  in  these  one  sided  tests  we  are 
looking  for  counter association.  That  is,  the  probability  of 
being  alive  decreases  as  concentration  group  increases.  We  are 
thus  testing  for  departures  from  0  in  a  negative  direction. 


Figure  XI. 8  contains  instructions  for  using  program  BMDP1F. 
Figures  XI. 9,  10  and  11  illustrate  one  sided  and  chi  square  tests 
on  tables  (based  on  artificial  data)  that  exhibit  linear  trends 
of  response  probability  with  concentration  group,  but  with  dif¬ 
fering  slopes.  They  represent  mild,  moderate,  and  strong  trends. 
In  each  case  the  one  sided  test  based  on  measures  of  association 
reflects  a  much  stronger  association  between  categories  than  does 
the  chi  square  test. 

In  conclusion,  we  see  that  the  various  ordinal  measures  of 
association  provide  equivalent  tests  of  the  null  hypothesis  of 
no  association  between  mortality  rate  and  concentration  group. 
Further  more  all  the  tests  are  much  more  sensitive  than  the  chi 
square  test  to  alternatives  of  a  monotonic  nature. 

Appendix  A5,  pp  778-792,  of  the  1979  BMDP  manual  [27]  and 
Brown  and  Benedetti  [32]  are  helpful  in  interpreting  the  output 
from  BMDP1F.  Brown  and  Benedetti  calculate  two  asymptotic  stan¬ 
dard  errors  for  each  measure,  ASE0,  ASE-^.  ASE^  is  derived  as¬ 
suming  the  alternative  hypothesis  is  true;  i.e.  the  measure  is 
not  zero.  It  is  obtained  by  the  method  of  Goodman  and  Kruskal 
[31]  and  is  appropriate  for  setting  confidence  limits  on  the 
measures  for  large  samples.  Brown  and  Benedetti  discuss  the  use 
of  ASE^  in  computing  confidence  limits  and  power  in  an  unpub¬ 
lished  technical  report  that  is  available  from  the  Health  Sciences 
Computing  Facility  at  UCLA. 

ASEq  is  computed  under  the  null  hypothesis  that  the  measure 
is  zero.  It  was  derived  by  Brown  and  Benedetti  in  the  1977 
paper  cited  above. 

The  T-value  for  each  measure  is  the  ratio  of  each  measure 
to  its  ASEQ.  Brown  and  Benedetti  report,  based  on  simulation 
studies,  that  the  use  of  ASEQ  in  the  denominator  of  the  T-stati- 
stic  rather  than  ASE^  or  other  suggestions  made  in  the  litera¬ 
ture  gives  superior  results  in  that  the  attained  type  I  error 


rates  are  closer  to  nominal  and  are  more  consistent  for  differing 
patterns  of  probabilities. 

To  illustrate  the  use  of  these  one  sided  tests  of  association 
vs  ordered  alternatives  as  compared  with  the  chi  square  test  of 
significance,  we  ran  the  BMDP  program,  BMDP1F,  on  mortality  data 
data  from  the  DeFoe  compound  C  test  and  from  the  Holcombe 
and  Phipps  compound  D  tests.  The  results  are  shown  in  Figures 
XI. 12-15,  both  for  embryo  mortality  data  and  for  fry  mortality 
data.  Qualitatively  there  is  no  difference,  in  these  data,  in 
the  conclusions  arrived  at  by  each  procedure.  The  relationships 
between  concentration  and  percent  response  are  so  strong  in  the 
fry  mortality  data  that  the  observed  significance  levels  are  0 
to  many  decimal  places.  For  the  embryo  mortality  data,  neither 
procedure  reveals  a  statistically  significant  relation  between 
concentration  and  percent  mortality.  Since  the  observed  mortal¬ 
ity  in  the  DeFoe  embryo  data  is  smaller  at  the  higher  concentra¬ 
tions  than  at  the  lower,  the  measures  of  association  are  positive 
and  the  observed  significance  level  is  higher  for  the  one  sided 
test  than  for  the  chi  square  test.  We  almost  have  a  significant 
trend  in  the  wrong  direction!  Why?  It  may  be  due  to  the  outlier 
tank  in  group  2. 

In  conclusion  it  should  be  remarked  that  any  overall  test 
for  concentration  related  effects  is  just  a  screening  device. 

It  merely  states  whether  there  is  any  statistical  evidence  of 
concentration  related  effects  but  does  not  provide  any  indicat¬ 
ion  of  which  treatment  groups  have  responses  that  differ  from  the 
control  group.  That  is  the  role  of  multiple  comparisons.  The 
overall  test  is  intended  to  screen  out  those  data  sets  for  which 
multiple  comparisons  would  be  a  futile  exercise  because  no  diff¬ 
erences  exist.  In  this  regard  it  should  be  noted  that  since  the 
overall  test  is  just  a  preliminary  procedure,  it  makes  sense  to 
use  a  very  liberal  a-level,  like  a  =  .20  or  perhaps  even  a  =  .50. 
This  improves  the  sensitivity  of  the  test  to  detect  marginal 
effects,  but  at  the  expense  of  an  increased  false  rejection  rate. 
However  such  false  rejections  of  the  null  hypothesis  will  be  de¬ 
tected  later  in  the  multiple  comparison  phase. 
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Figure  XI. 1  SAS  PROC  FREQ  output  from  chi  square  test  for  homogeneity 
across  groups  applied  to  fry  mortality  data  from  Holcombe 
and  Phipps  test  on  Compound  D 
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Figure  XI. 2  EXAX2  output  —  chi  square  test  of  homogeneity  across  groups 
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Figure  XI. 3  — EXAX2  output —  chi  square  test  of  homogeneity  across  groups 
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Figure  XI. 4  — EXAX2  output —  chi  square  test  of  homogeneity  across  groups 
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Figure  XI. 5  — EXAX2  output —  chi  square  test  of  homogeneity  across  groups 
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Figure  XI. 6  — EXAX2  output —  chi  square  test  of  homogeneity  across  groups 
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Figure  XI. 7  — EXAX2  output —  chi  square  test  of  homogeneity  across  groups 
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Figure  XI. 8  Input  information  required  for  program  BMDP1F 
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Figure  XI. 9  BMDP1F  output  from  one  sided  measure  of  association  test 


TABlfc  NO.  2  MCA  T  <VAA  2»  VS  CONC  (VAR 


«i  «o>*  o 

h  ■£(<>  © 

O  *\  >0 


O  ;  •  1 

o 

•  -~#CD  |  o' 

*  |-i 


o  i 

O  •1-0  I  o 

•  <— •»  |  o 

(Tv  l«4 


■  o  i  *  i 

O  Irgcu  I  O 
I  •  i-*»  I  ©} 
<  ;  < 


o  ! 

,  o  oo  i  OJ 

•  I  ©I 
ct  «\  |  **\ 


Borg  H  O 

a  l  © 

I  hi 


fO*  I  o 
I  o 


I  * 

uiO< 

' 

1  * 

©a 

1  1  • 

-JXwO 

1 

1  I* 

<02 

•  | 

>4©© 

3 

r  • 

QC 

Ul  1 

i  • 

Oku 

V  I 

!  • 

•  bo 

-f  • 

*-<Z. 

ct  do 

<  • 

u 

OJ  •  • 

• 

ujiV)o 

hn 

3;  • 

hj: 

! 

«-i  • 

j 

1  • 

UJ*-U 

■  • 

i  • 

QImOI 

OCO'J- 

I  I  I  I! 


O'  lOf-t 
^rgf*VH 

■-400CH 
■„'>  «  (  « 

sOOO 


UUflD  o^, 

JCOOI 

l<  •  •  1 

;>???! 


fcur°! 

i^-  -J*/v 

t 

^oc<od 

*-kujo 


4*  H 


ftU»  o> 

D*  0* 
I-J^O  o; 

-  «l 
,»rg  r* 
I, 


ujO  oi 
*/)  •  •! 
►<o  © 


luoc  0" 
;©<*>  ^ 
Mrg  -* 

K  •  •! 


o 

o 

►- 

1/1 

• 

H4 

<x 

>-Sl 

Oi 

•« 

r 

o 

(/)© 

d  r 

i  5 

«  »©/ 

fl  I 

i  ■»!  $ 


t!  3 

* 

<5;  <3 


110 


Figure  XI. 10  BMDP1F  output  from  one  sided  measure  of  association  test  on  artificial  data 


Figure  XI. 11  BMDP1F  output  from  one  sided  measure  of  association  test  on  artificial  data 


Figure  XI.  12  BFiDPlF  output  on  real  data 
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Fi^-.re  XI.  13  BMPD1F  output  on  real  data 
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Figure  XI. 15  BMDP1F  output  on  real  data 


XII.  TREATMENT  GROUP  VS  CONTROL  GROUP  PAIRWISE  MULTIPLE  COMPARISON 
PROCEDURES 


If  the  overall  test  rejects  the  hypothesis  of  no  concentration 
related  effects  we  must  determine  which  treatment  group  response  rates 
differ  from  the  control  group  rate.  A  number  of  procedures  can  be  used 
for  such  inferences,  some  based  on  hypothesis  testing  and  some  based  on 
confidence  interval  estimation.  In  this  section  we  consider  several 
approaches  based  on  tests  of  hypotheses.  The  discussion  is  by  no  means 
exhaustive.  In  the  following  section  we  discuss  confidence  interval  pro¬ 
cedures  . 


A  common  approach  to  multiple  comparisons  on  qualitative  response 
rate  data  such  as  mortality  rates  is  to  carry  out  an  arc  sine  normalizing 
transformation  on  the  observed  response  rate  within  each  group  and  then 
compare  each  treatment  group  with  the  control  using  Dunnett's  or  Williams' 
procedures  [34,  35,  36,  37].  Such  procedures  are  based  on  asymptotic 
theory  whose  validity  is  questionable  if  there  are  a  number  of  small 
expected  frequencies. 

An  alternative  multiple  testing  approach  is  to  carry  out  a  suc¬ 
cession  of  2  x  2  contingency  table  tests  of  homogeneity  between  each 
treatment  group  and  the  control  group,  based  on  Fisher's  exact  test 
[  13  ]  or  on  asymptotic  theory  depending  on  expected  frequencies.  Our 
EXAX2  program  will  do  this.  A  treatment  group  is  said  to  be  (statistica¬ 
lly)  significantly  different  from  the  control  group  at  e.g.  the  a  =  0.05 
level  if  the  pairwise  test  rejects  the  null  hypothesis  after  adjusting 
for  simultaneity  by  Bonferroni's  method.  (i.e.  If  we  perform  five  pair¬ 
wise  comparisons  and  wish  to  guarantee  an  overall  a  =  0.05  type  one  error 
level  then  each  individual  comparison  must  be  made  at  the  a/ 5  =  0.01 
level) .  Note  that  this  approach  does  not  impose  any  monotonicity  struc¬ 
ture  on  the  response  rates  and  so  may  not  be  most  sensitive  to  detect 
small  to  moderate  effects. 

Dunnett  [  34  ,  35  ]  presents  a  procedure  for  multiple  comparison 

of  each  of  the  treatment  group  responses  with  the  control  group  response, 
controlling  the  overall  error  rate  for  all  comparisons.  His  procedure 
is  derived  for  quantitative  responses,  assumed  to  have  equal  variability. 
He  assumes  equal  replication  among  the  treatment  groups  with  equal  or 
possibly  greater  replication  of  the  control  group.  We  might  apply  this 
procedure  to  qualitative  response  data  from  toxicity  tests  after  perform¬ 
ing  an  arc  sine  variance  stabilizing  relat  on  the  observed  responses. 

A  problem  with  the  application  of  Dunnett's  procedure  to  the 
analysis  of  data  from  toxicity  tests  is  that  is  does  not  take  full  accou¬ 
nt  of  the  structure  of  the  problem.  Namely  the  various  treatment  groups 
correspond  to  increasing  toxicant  levels.  One  might  therefore  assume 
a  monotone  (increasing  or  decreasing)  response  level  with  increasing 
group  number.  Since  Dunnett  did  not  build  such  a  monotoxicity  assump¬ 
tion  into  his  procedure,  it  loses  some  sensitivity. 


Williams  [36,  37]  assumes  a  monotone  response  function.  He 
estimates  the  treatment  and  control  group  response  rates  under  the  mono¬ 
toxicity  restraint  and  uses  these  estimates  for  treatment  group  -  control 
group  comparisons.  See  Williams  [36]  for  details.  Chew  [38,  pp. 26-271  brief¬ 
ly  describes  Williams’  method  and  presents  tables  for  its  implementation. 
Williams  [  36  ]  assumes  equal  replication  for  all  concentrations  (includ¬ 
ing  the  control).  He  extends  this  procedure  [  37  ]  to  accomodate  increas¬ 
ed  replication  in  the  control  group,  two  sided  tests,  and  modifications 
to  account  for  unequal  replication  among  the  treatment  groups.  As  with 
Dunnett's  procedure,  we  can  apply  Williams  method  to  qualitative  respo¬ 
nse  data  after  carrying  out  an  arc  sine  transformation. 

We  illustrate  Williams'  method  with  several  examples  based  on 
results  from  fish  toxicity  tests.  Consider  first  the  fry  mortality  data 
from  the  Holcombe  and  Phipps  test  on  compound  D.  From  the  preliminary 
scatterplot  in  Figure  VI. 6  and  the  overall  tests  of  significance  in 
Section  XI,  it  is  quite  evident  that  fry  mortality  increases  with  in¬ 
creasing  toxicant  level.  We  wish  to  determine  here  which  treatment  groups 
exhibit  significantly  greater  fry  mortality  rates  than  the  control  group. 

As  the  result  of  the  within  groups  heterogeneity  test  was  marginal 

(a  =  0.14,  see  section  VIII)  we  do  not  adjust  the  data  prior  to  carrying 

out  Williams'  procedure. 

The  basic  and  transformed  responses,  pooled  across  tanks  within 
groups  are: 


Group  (i) 

1 

2 

3 

4 

5 

6 

Sample  Size  (n^) 

100 

100 

100 

100 

100 

100 

/-v 

Response  Rate  (p^) 

0.06 

0.08 

0.08 

0.13 

0.79 

1.00 

2Arc  Sin 

0.495 

0.574 

0.574 

0.738 

2.190 

3.142 

Since  these  estimates  are 

already  in 

monotone 

sequence 

,  they  do 

not  need 

to  be  modified.  We  declare  the  group  i  response  rate  to  be  significantly 
different  from  the  control  rate  if 


Ci  -  Of  >  t  (2/n) 

The  factor  t  can  be  obtained  from  Williams'  tables  corresponding  to  the 
5%  or  the  1%  significance  level.  The  yardstick  t(2/n)^^is  based  on  the 

assumption  that  the  variance  of  2  arc  sin  /p)  is  1/n.  In  our  example  n  = 
100  and  t  =  1.756  (corresponding  to  5  treatment  groups  and  a  =  0.05). 
Thus  the  response  in  group  i  is  declared  to  differ  significantly  from 


1/2 

the  control  group  response  if  >  0.495  +  1.756  (2/100)  =  0.743. 

Groups  5  and  6  differ  significantly  from  the  control  and  group  4  is  just 
on  the  borderline. 

We  now  examine  the  effect  on  the  outcome  of  this  procedure  of 
applying  an  adjustment  for  tank  to  tank  heterogeneity.  From  Section  IX 
B  we  see  that  this  factor  is  K  =  1.337  for  the  above  data.  Thus  the 
"effective"  sample  size  per  group  is  100/1.337  =  74.79  and  the  decision 
point  for  Williams'  procedure  becomes  0.495  +  1. 756(2/74. 79)^/^  =  0.782. 
Group  4  is  no  longer  borderline. 

We  now  apply  this  same  procedure  to  the  embryo  mortality  data 
from  the  Jarvinen  test  on  compound  B.  The  result  of  the  within  groups 
heterogeneity  test  was  highly  significant  (a  =  0.005)  and  so  we  first 
adjust  the  data  prior  to  carrying  out  Williams'  procedure.  From 

Section  IXB  the  adjustment  factor  is  K  =  3.071.  The  basic  and  trans¬ 
formed  responses  along  with  effective  sample  sizes,  pooled  across  tanks 
within  groups  are: 


Group  (i) 

1 

2 

3 

4 

5 

6 

Effective  Sample 

A 

size  (n^/K) 

32.89 

35.49 

32.56 

31.91 

34.19 

32.89 

Response  Rate  (p^) 

0.139 

0.073 

0.090 

0.020 

0.057 

0.079 

2  Arc  Sin  i/jp^  =  y^ 

0.764 

0.547 

0.609 

0.284 

0.482 

0.570 

For  the  sake  of  simplicity  we  will  utilize  an  average  sample  size  of 
33.32  within  each  group,  but  the  calculation  could  alternatively ^be 
carried  out  based  on  the  individual  group  sample  sizes.  Since  {y^}  are 
not  in  monotone  sequence  we  must  first  modify  them  by  an  averaging  pro¬ 
cess  discussed  in  Williams  [  36  ]  or  in  Chew  [  38  ]  until  the  resulting 
estimates  satisfy  the  monotoxicity  constraint.  We  obtain  0.537,  0.537, 
0.537,  0.537,  0.537,  0.570.  We  declare  group  i  significantly  greater 

than  the  control  group  if  y^  >  0.764  +  1.756  (2/33. 32) -1-/2  =  1.194.  Ob¬ 
viously  no  treatment  groups  have  significantly  greater  response  than  the 
control  group.  (Interestingly  if  we  carry  out  Williams'  procedure  on 
these  data  to  look  for  a  monotone  decreasing  trend  in  response  rate, 
we  arrive  at  the  same  conclusion.  That  is,  no  group  has  significantly 
lower  response  rate  than  the  control  group) . 

Dunnett's  and  Williams'  procedures  are  based  on  asymptotic 
theory.  If  the  response  frequencies  do  not  justify  the  use  of 
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asymptotic  theory  we  can  carry  out  a  succession  of  exact,  small  sample 
2x2  treatment-control  comparisons  by  means  of  Fisher's  exact  test, 
adjusting  for  simultaneity  by  Bonferroni's  method.  Consider  for  example 
the  comparison  of  treatment  group  4  with  the  control  group  for  the  fry 
mortality  data  from  Holcombe  and  Phipp's  test  on  compound  D.  We  have 
the  following  2x2  table. 


CONTROL  GROUP  4 

DEAD 

6  13 

19 

LIVE 

94  87 

181 

100  100 

200 

Here,  in  the  notation  of  Lieberman  and  Owen, 

k  =  19 

n  =  100 

N  =  200 

x  =  6 

Interpolating  in  the  Lieberman  and  Owen  tables  [  26  ] 
between  N  =  100  and  N  *  00  we  have 

N  =  100  P(X<6)  =  .062  1/N  =  .01 

N  =  °°  P(X<6)  =  .0835  1/N  =  0 

N  =  200  P(X<6)  =  ?  1/N  =  .005 

Thus  P(X<6)  =  1/2 (.0835  +  .062)  =  .073 

Thus  this  table  is  significant  at  the  .07  level  (not  accounting 
for  simultaneity) . 

This  exact  test  procedure  is  thus  seen  to  be  somewhat  less  sen¬ 
sitive  than  Williams'  procedure  applied  to  the  same  data.  This  is  under¬ 
standable  since  it  does  not  incorporate  the  monotonicity  structure  of  the 
response  rates. 


CONFIDENCE  INTERVAL  PROCEDURES  FOR  COMPARISON  OF  TREATMENT  GROUP 
AND  CONTROL  GROUP  RESPONSE  RATES 

A.  Introduction 


We  have  previously  considered  overall  tests  of  hypotheses  to 
compare  response  rates  in  replicate  tanks  within  treatment  groups 
and  to  compare  response  rates  across  treatment  groups.  In  this 
section  we  consider  procedures  for  constructing  confidence  inter¬ 
vals  to  compare  response  rates  in  the  treatment  groups  with  that 
in  the  control  group  on  a  pairwise  basis. 

It  is  well  known  that  hypothesis  testing  procedures  are  some¬ 
what  limited  in  their  conclusions.  They  merely  state  whether  the 
null  hypothesis  was  accepted  or  rejected  but  give  no  indication  of 
the  extent  of  the  effect.  Thus  we  have  no  idea  of  the  biological 
significance  of  the  effect  (as  opposed  to  its  statistical  signi¬ 
ficance).  The  rejection  or  nonrejection  of  a  null  hypothesis  is 
often  more  a  result  of  sample  size  than  of  the  biological  import¬ 
ance  of  the  effect.  The  determination  of  acceptable  concentrations 
should  be  based  on  what  are  biologically  significant  effects  rather 
than  on  the  power  function  of  a  hypothesis  testing  procedure. 

Confidence  intervals  are  more  informative  than  tests  of  hypo¬ 
thesis.  The  widths  of  the  confidence  intervals  indicate  the  degree 
of  precision  in  the  data  concerning  the  estimates  of  the  quantities 
of  interest  in  our  inferences.  Narrow  confidence  intervals  signify 
precise  inferences  while  wide  confidence  intervals  signify  imprecise 
inferences. 

In  the  discussion  in  this  section  we  consider  the  case  of  no 
tank  to  tank  heterogeneity  within  groups.  Thus  we  pool  responses 
across  tanks  within  groups  to  arrive  at  average  response  rates  with¬ 
in  groups.  The  presence  of  tank  to  tank  heterogeneity  can  be  accoun¬ 
ted  for  by 

1.  Fitting  a  model  which  explicitely  accounts  for  heterogeneity 
of  response  rates  across  tanks  —  for  example  the  beta  bi¬ 
nomial  extension  of  the  binomial  model,  the  negative  binom¬ 
ial  extension  of  the  Poisson  model,  or  a  variance  components 
extension  of  a  fixed  effects  analysis  of  variance  model  for 
quantitative  responses. 

2.  By  carrying  out  analyses  on  a  per  tank  basis  rather  than 
on  a  per  fish  basis.  This  approach  is  conservative  and 
greatly  diminishes  the  number  of  degrees  of  freedom  avail¬ 
able  for  error  estimation. 


3.  By  adjusting  the  data  to  account  for  the  extent  of  tank  to 
tank  variation.  Namely  tank  to  tank  variation  can  be 


regarded  as  correlated  responses  within  tanks,  generally 
positively  correlated.  Thus  the  variability  of  the  average 
responses  within  tanks  is  greater  than  would  be  the  case  if 
the  responses  were  independent  within  tanks.  Such  reduction 
in  variation  can  be  simply  accounted  for  by  reducing  the 
"effective"  sample  size  within  tanks  to  a  lesser  value  and 
then  ignoring  the  within  tank  correlation  and  proceeding 
with  binomial  based  procedures  or  the  like.  The  reduction 
in  "effective"  sample  size  reduces  the  precisions  of  the 
estimates  and  test  statistics  just  as  does  correlation 
effects . 

The  procedures  discussed  in  this  section,  although  based  on  binomial 
theory,  can  be  used  in  conjunction  with  adjustment  method  3.  Thus  they 
are  also  relevant  in  the  case  when  tank  to  tank  heterogeneity  exists. 

Consider  the  Holcombe  and  Phipps  compound  D  fry  mortality  data.  We 
wish  to  compare  the  response  rate  in  treatment  group  4  with  that  in  trea¬ 
tment  group  1  (the  control  group).  The  basic  data,  pooled  across  tanks 
within  groups,  is 

CONTROL  GROUP  4 


Fisher's  exact  test  (without  simultaneity  adjustment)  says  that  p/+  is 
"significantly"  greater  than  p-^  at  the  a  =  0.07  level.  However  a  sig¬ 
nificance  statement  such  as  this  says  nothing  about  the  magnitude  of 
P4/PI'  Estimating  the  value  of  this  ratio  is  important  for  assessing 

whether  there  is  a  biologically  significant  increase  in  mortality  between 
the  control  group  and  group  four.  Confidence  interval  procedures  enable 
us  to  estimate  p^/p-^  and  determine  the  precision  of  our  estimate  as  well 
determine  whether  p^  is  (statistically)  significantly  greater  than  p^. 

There  are  three  approaches  to  the  construction  of  confidence  inter¬ 
vals  in  the  case  of  quantal  response  data. 

•  Large  sample  normal  theory  confidence  intervals. 

•  Exact,  small  sample  confidence  intervals  based  on  the  noncentral 
distribution  of  the  2x2  contingency  table,  conditional  on  the 
margins.  (See  Thomas  [39])  for  the  theory  and  the  algorithm. 

We  have  implemented  this  algorithm  in  EXAX2[14]. 
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#  Approximate  confidence  intervals  based  on  Poisson  theory. 
These  intervals  are  most  appropriate  when  the  response  pro¬ 
babilities  are  small  (usually  under  .10). 

It  should  be  noted  that  these  procedures  do  not  take  the  monotonic  nature 
of  the  response  probabilities  into  account.  We  consider  each  of  these 
approaches  in  turn. 


B.  Method  1  Asymptotic  Approach 

To  use  the  asymptotic  normal  approach  we  adopt  a  conservative 
yardstick  and  require  that  each  cell  in  the  2x2  table  under  consi¬ 
deration  contain  at  least  5  responses. 

For  most  situations  of  practical  interest  both  p^  and  p^  will  be 
relatively  far  from  1.  Certainly  if  p^,  the  mortality  rate  in  the 
control  group  is  close  to  1,  the  test  will  be  terminated.  If  p^  is 
very  close  to  1  while  p-^  is  close  to  0,  there  is  no  need  in  calcula¬ 
ting  confidence  intervals  on  their  ratio.  Group  4  will  be  obviously 
unsatisfactory . 


We  wish  to  calculate  an  asymptotic  theory  confidence  interval 
on  the  ratio 

8  ‘  P4/Pl 


Let 


<p  =  £n6  =  Jtnp^,  -  £np^ 


We  estimate  <f>  by 

A 

<f>  =  £n£^  ~  £np^ 

As  Np  -*■  00  with  p^,  p^  fixed 


<p  is  approximately  N/<J>, 


Vl  +  N4P4  ) 

Thus  an  approximate  95%  confidence  interval  on  <t>  is 


4>  -  1.96 


+  q* 


N1P1  N4P4 


1/2 


<  <P  <  <f>  +  1.96 


qi  +  q4 


-.1/2 


N1P1  N4P4 
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In  the  case  of  the  Holcombe  and  Phipps  example 

=  100  $  =  1  -  .06  =  .94 

Px  =  6/100  =  0.06  q4  =  1  -  .13  =  .87 

P4  =  13/100  =  0.13 

Substituting  p^,  q^,  £  ,  q  for  the  corresponding  parameters  in  the 
standard  error  formula  we  nave 


Thus  an  approximate  95%  confidence  interval  on  <)>  is 

(0.773  -  1. 96(. 473) ,  0.773  +  1.96(.473))  =  (-.154,  1.700) 

Therefore,  (e  e^’700^  is  an  asymptotic  95%  confidence  interval 

on  0  E  P4/Pr 

This  interval  is 

(0.857,  5.474) 

The  conclusions  from  this  confidence  interval  calculation  are 

•  P4  is  not  "significantly"  different  from  pj^  at  the  .05 
level  since  the  confidence  interval  contains  1.  (Note 
that  we  observed  borderline  significance  with  Williams’ 
procedure  at  a  =  0.05). 

•  P4  is  not  very  much  smaller  than  pj  (at  least  86%  of  p^) 
but  may  be  much  larger  than  p1  (as  much  as  5.5  times  px) 

•  P4/PI  is  not  determined  very  precisely  by  the  data, based 
on  such  a  comparison. 

We  have  thus  quantified  the  relation  between  p^  and  p4> 

We  now  calculate  95%  confidence  intervals  to  compare  the  response 
rates  in  each  of  the  other  treatment  groups  with  that  in  the  control 
group. 
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CONTROL  GROUP  2  GROUP  3  GROUP  4  GROUP  5  GROUP  6 


DEAD 

6 

8 

8 

13 

79 

100 

LIVE 

94 

92 

92 

87 

21 

0 

100 

100 

100 

100 

100 

100 

Holcombe  and  Phipps  Compound  D  Fry  Mortality  Data  Pooled  Across 
Tanks  within  Treatment  Groups 

We  are  not  particularly  interested  in  comparing  group  6  with  the  control 
since  it  is  obviously  inferior.  We  thus  compare  groups  2,  3,  and  5  with 
the  control  group  by  means  of  asymptotic  95%  confidence  intervals. 

Group  2  vs  Control 


e  ^  p2/px 

<J>  =  £n0  =  £np2  -  £np^ 

<j>  =  £np2  -  £np,  =  £n0.08  -  £n0.06  =  0.288 
stderr(<j>)  = 


r  £  An 

q2  ql 

1/2 

.92  .94 

N2P2  n1p1 

100 (.08)  100 (.08) 

- 

11/2 


=  0.521 


4>  +  1.96  stderr (<t>)  =  0.288  +  1.96(0.52)  =  (-.733,  1.309) 

-  733  1  309 

Thus  an  asymptotic  95%  confidence  interval  on  0  is  (e  ,  e  )  = 
(0.480,  3.703).  This  implies  that  there  is  no  statistical  evidence  at 
the  .05  level  of  a  difference  between  P2  and  p-^.  Furthermore  the  present 
data  do  not  determine  this  ratio  very  precisely. 


Group  5  vs  Control 

A 

=  £np5  -  £0^  =  £n0. 79  -  £n0.06  =  2.578 

■  [loSTw  +  xo5Tom]1/2  ■  <-159>1/2  ■  °-399 
<P  +  1. 96stderr ( <}>)  =  (1.794,  3.362) 

1  794  3  362 

Thus  an  asymptotic  95%  confidence  interval  on  0  is  (ex’  ,  eJ’  )  = 
(6.011,  28.859).  There  is  thus  overwhelming  statistical  evidence  that 
the  response  rate  in  group  5  is  substantially  greater  than  that  in  the 
control  group,  by  at  least  a  factor  of  6.  The  interval  however  is  very 


wide  and  so  we  cannot  determine  the  ratio  very  precisely. 


i 


i 


i 


i 


We  may  wish  to  modify  these  intervals  for  simultaneity.  Since  we 
are  calculating  4  confidence  intervals  we  can  adjust  their  levels  to 
attain  a  familywise  confidence  level  of  0.05.  The  simplest  way  to  do 
this  is  by  means  of  Bonferroni's  inequality.  Namely  we  construct  each 
interval  at  individual  confidence  level  1  -(.05/4)  =  .9875.  The  approp¬ 
riate  normal  distribution  factor  then  becomes  2.50. 


exp 


^(pj/p^)  +  2.50 


b 


/NPj 


+  vVi  1/2  j 


J  =  2,  3,  4,  5 


These  intervals  are : 


Group  2  vs  Control 
Group  3  vs  Control 
Group  4  vs  Control 
Group  5  vs  Control 


(.363,  4.906) 
(.363,  4.906) 
(.664,  7.067) 
(4.857,  35.712) 


We  thus  conclude  that  there  is  strong  statistical  evidence  that 
group  5  has  at  least  5  times  the  response  rate  of  group  1  but  there  is 
not  enough  statistical  evidence  to  distinguish  the  response  rates  at 
groups  2,  3,  4,  from  that  at  group  1.  Furthermore  the  data  are  not 
sufficient  to  make  precise  inferences  about  the  ratios  of  treatment 
group  to  control  group  response  rates  without  putting  further  structure 
on  the  problem  such  as  assuming  some  sort  of  dose  response  relation. 

We  will  consider  this  approach  in  subsequent  sections. 

C.  Method  2  Exact,  Small  Sample  Confidence  Intervals 

If  the  sample  sizes  are  not  sufficiently  large  to  apply  the 
asymptotic  confidence  interval  procedure  (method  1)  and  if  response  pro¬ 
portions  are  not  sufficiently  small  to  apply  Poisson  theory  (method  3), 
then  confidence  interval  comparisons  between  treatment  groups  and  control 
group  can  be  made  by  an  exact,  small  sample  procedure.  This  procedure 
is  based  on  the  non  null  distribution  of  Fisher's  exact  test  in  2  x  2 
contingency  tables. 


Consider  a  2  x  2  contingency  table  to  compare  the  response  rate 
in  a  particular  treatment  group  with  that  in  the  control  group. 


Control 

Group  2 

Total 

Dead 

X1 

X2 

X1  +  X2 

Live 

m  - 

m  -  X2 

m  +  n  -  (X  +  X2) 

in 

n 

m  +  n 
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Let  p  ,  p2  denote  the  response  probabilities  (e.g.  probability  of 
death)  within  the  control  group  and  treatment  group  respectively.  We  can 
test  the  hypothesis 


Ho:  Px  =  P2 
vs  px  i  p2 

by  means  of  Fisher’s  exact  test  (Lehmann  [  25  ] ,  Lieberman  and  Owen  [  26  ])» 
conditional  on  the  margins  of  the  table  being  fixed.  This  test  is  based 
on  the  hypergeometric  distribution.  We  reject  if  X2  is  too  extreme. 

The  nonnull  distribution  of  conditional  on  +  X^  =  t  is 

P(X2  =  x|x  +  X2  =  t)  =  Cjp)  (t?x)(s)pX  x  =  0,  1,  2,...,t 
where  Ct(p)  is  a  normalizing  constant  and 


The  quantity  p  is  known  as  the  odds  ratio  and  is  very  important  for 
power  calculations  and  for  calculating  confidence  intervals  to  compare 
the  response  rates  in  the  treatment  and  control  groups  on  a  pairwise 
basis. 


The  odds  ratio  is  a  quantity  between  0  and  °°.  p  =  1  if  and  only  if 

P1  =  P2‘  If  P  >  1  then  P^/p2>-l  and  if  P<:L  then  PL/P2<d*  The  size  of 

the  confidence  interval  on  p  indicates  how  precisely  this  quantity  can 
be  estimated  from  the  data. 


Thomas  [39]  presents  an  algorithm  for  calculating  exact,  small 
sample  confidence  intervals  on  p  based  on  the  distribution  of  X2 ,  condi¬ 
tional  on  the  margins  of  the  table.  We  have  implemented  Thomas'  algo¬ 
rithm  in  EXAX2  [14]  and  illustrate  the  calculation  of  the  confidence  in¬ 
tervals  with  several  examples. 

We  first  consider  the  Holcombe  and  Phipps  compound  D  fry  mortality 
data.  The  output  appears  in  Figure  XIII. 1.  The  first  page  of  the  out¬ 
put  defines  the  odds  ratio  explicitly  in  terms  of  the  order  of  the 
groups  and  the  order  of  the  response  categories.  Subsequent  pages  pre¬ 
sent  the  individual  2x2  tables  to  compare  treatment  groups  with  the 
control  group  on  a  pairwise  basis,  a  point  estimate  and  confidence  in¬ 
terval  on  the  odds  ratio  and  the  one  sided  significance  level  of  Fisher's 
exact  test  for  equality  of  the  two  response  probabilities. 

It  should  be  noted  that  the  quantities  ALPHAL  and  ALPHAU,  which 
specify  the  probability  inequalities  governing  the  upper  and  lower 
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confidence  limits  are  under  the  control  of  the  user.  They  can  be  adjust¬ 
ed  to  yield  one  sided  upper  or  lower  confidence  bounds  in  place  of  two 
sided  intervals  or  to  account  for  simultaneity  by  means  of  Bonferroni's 
method. 


In  the  present  example  individual  95%  two-sided  confidence  intervals 
are  calculated  on  the  odds  ratios  of  each  treatment  group  with  the  control 
group.  The  conclusions  are  similar  to  those  arrived  at  with  the  asympto¬ 
tic  intervals.  Namely  the  response  rates  in  groups  2  and  3  cannot  be 
distinguished  from  that  in  the  control  group.  The  fry  mortality  rate  in 
group  4  is  marginally  worse  than  the  control  group  rate.  The  lower  con¬ 
fidence  limit  of  0.13  suggests  that  the  fry  mortality  rate  in  group  4 
could  be  substantially  worse  than  the  control  rate.  The  upper  confidence 
limit  of  1.27  is  not  too  far  removed  from  1.0.  This  implies  that  the 
fry  mortality  rate  in  group  4  is  not  significantly  different  from  that 
in  group  1  at  a  =  0.05  but  would  be  significant  at  a  slightly  higher 
a  -level,  (a  =  0.07  suffices  here).  The  odds  ratios  comparing  the  re¬ 
sponses  rates  in  groups  5,  6  to  that  in  the  control  group  are  very  small 
and  the  upper  bounds  are  very  small.  There  is  thus  strong  evidence  that 
these  groups  have  significantly  higher  fry  mortality  rates  than  the  con¬ 
trol  group  and  substantilly  so. 


The  large  widths  of  the  confidence  intervals  imply  that  the  odds 
ratios  cannot  be  determined  very  precisely. 

We  next  consider  the  Holcombe  and  Phipps  compound  D  embryo  mortality 
data.  The  output  format  is  the  same  as  that  for  the  fry  mortality  data 
and  appears  in  Figure  XIII. 2.  We  see  that  none  of  the  treatment  group 
response  rates  are  significantly  different  from  the  control  group  rate. 
The  confidence  intervals  all  straddle  1  and  so  the  treatment  group  re¬ 
sponse  rates  cannot  be  distinguished  from  the  control  group  response 
rate.  This  is  in  conformance  with  the  results  of  our  preliminary  ana¬ 
lyses. 


The  previous  discussion  pertained  to  construction  of  exact,  small 
sample  confidence  intervals  on  the  odds  ratio 


P 


Pl/q 

p2/q 


1 

2 


However  p  has  no  direct  physical  interpretation.  A  parameter  such  as 

e  -  p 2/Pj_ 


is  more  physically  meaningful.  How  can  we  construct  confidence  intervals 
on  0  based  on  the  confidence  intervals  we  have  constructed  on  p?  We  can 
express  0  in  terms  of  p  and  p  .  Namely 


p  +  P1(l  -  p) 


If  p<l  then  0  decreases  as  increases  from  0  to  I. 

If  p>l  then  0  increases  as  p^  increases  from  0  to  1. 

For  fixed  p  ,  0  decreases  as  p  increases  from  0  to  °° 

.  'V 

Suppose  (  p#  p)  is  a  confidence  interval  on  p  and  suppose 

,  ~~  % 

'■^1»  P^)  is  a  confidence  interval  on  p^. 

Then  a  conservative  confidence  interval  on  0  i.' 


Where 


V1 

-  p) 

P +  Px  (1 

-P)/ 

*1 

n, 

>1 

** 

fol 

a. 

K 

as  p 

<1 

P1  = 

-u  as  £  : 

lpl 

The  confidence  interval  on  the  odds  ratio  p  comes  from  the  EXAX2  program 
output.  Confidence  intervals  on  p^  can  be  calculated  by  the  Pearson- 
Clopper  method.  Namely  if 

X1 

pi  =  n7  then 


Ni  "  xi  +  1  1  _! 

=  1  +  - - -  F(2N1  -  2XX  +  2,  2XX;  1  -  a/2) 

=  0  if  X  =  0 


%  N1  "  X1  1  )_1 
P1  ~  1  +  Xx  +  1  F(2X  +  2,  2N  -  2X  ;  1  -  a/2) 

=  1  if  Xx  =  N 

These  confidence  intervals  are  given  in  chart  form.  See  for  example  Box 
Hunter,  and  Hunter  [40],  pages  642,  643  or  Dixon  and  Massey  [13],  pages 
501-504. 


We  apply  this  conservative  procedure  to  the  Holcombe  and  Phipps  comp¬ 
ound  D  fry  mortality  data  and  compare  the  results  with  those  calculated 
by  cue  asymptotic  approach. 

In  the  control  group  =  6,  =  100. 

Thus  p^  =  0.06.  A  99%  2  sided  confidence  interval  on  p^  is,  from  the 
Pearson-Clopper  charts  entered  at  £  =  0.06,  n  =  100,  (0.02,  0.15)  = 

<&>  V- 

% 

The  95%  confidence  intervals  on  the  odds  ratio  p,  namely  (  p  ,  p)  are 


Group 

2 

vs 

Control 

(0.2018, 

2.5238) 

Group 

3 

vs 

Control 

(0.2018, 

2.5238) 

Group 

4 

vs 

Control 

(0.1278, 

1.2743) 

Group 

5 

vs 

Control 

(0.0055, 

0.0467) 

Group 

6 

vs 

Control 

(0, 

0.0027) 

Combining  these  results  as  discussed  previously,  we  obtain: 

Groups  2,  3  vs  Control 

% 

Since  P>1,  p<l  we  have 


V 


P  +  J^U  -  p) 


_  1  _ 

2.5238  +  0.02(1  -  2.5238) 


0.401 


6  p  +  £x(l  -  p)  0.2018  +  0.02(1  -  0.2018)  4,592 

This  would  be  a  conservative  100(1  -  .05  -  .01)  =  94%  confidence  inter¬ 
val. 


The  corresponding  95%  confidence  interval  based  on  asymptotic  normal 
theory  (0.480,  3.703).  We  see  that  the  two  intervals  are  qualitatively 
similar  but  that  the  conservative  interval  is  longer,  as  would  be  ex¬ 
pected. 

We  now  compare  the  conservative  small  sample  with  the  approximate 
large  sample  intervals  for  comparisons  of  groups  4,  5,  6  with  the  control 
group.  The  calculations  proceed  analagously. 
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Interval  on  0 


Interval  on  0 


Groups  2,3  vs  Control  (0.401,  4.592)  (0.480,  3.703) 

Group  4  vs  Control  (0.788,  6.885)  (0.857,  5.474) 

Group  5  vs  Control  (5.272,  39.856)  (6.011,  28.859) 

Group  6  vs  Control  (6.566,  50)  large  sample  interval 

not  calculated 

We  see  that  the  two  sets  of  intervals  are  qualitatively  similar  how¬ 
ever  the  conservative,  small  sample  intervals  are  30%-51%  longer  than 
the  corresponding  asymptotic  intervals. 

An  alternative  approximation  can  be  used  to  calculate  conservative 
confidence  intervals  on  0sp2/p^.  Consider  again  the  2x2  table. 


CONTROL  GROUP  2 


Let  p^,  p2  denote  the  probabilities  of  death  in  groups  1,  2  respectively. 
We  wish  to  construct  a  confidence  interval  on  p2/p^  -  9* 

Now  N^,  N^  were  fixed  by  the  experimenter.  Let  r  =  N2/N^.  Suppose 
we  assume  the  fiction  that  ^P0(A),  N2  'v  PQ (rA)  and  that  N^,  N2  in  the 
data  are  realizations  of  these  two  independent  random  variables.  Then 
X^,  X2,  Y^,  Y2  can  be  treated  as  independent  Poisson  random  variables 
with  means  p^A,  p2A,  q  A,  q2rA  respectively.  Confidence  intervals  on 
P2/p^  can  be  constructed  by  methods  like  those  discussed  in  connection 
with  the  Poisson  approximation  approach,  (method  3).  Namely 


X_  .  1  ^2  ^2  ^ 

x-TT  Fiii;  +  2,  2X2;  i  -  ^  7  i  ^  ~  +  2-  2xr 


1  -  a2)f  >  1  -  a 


where  =  a.  Now  these  confidence  intervals  are  conservative  because 

we  are  introducing  additional  variability  by  assuming  that  N^,  N2  are 
random  variables  rather  than  fixed  constants.  The  variances  of  X^,  X2 
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are  inflated  from  N2P2q2  to  N.^,  N2p2  by  this  assumption.  Thus 

the  greater  are  p  ,  p2>  the  more  conservative  this  procedure  will  be. 


We  illustrate  the  application  of  these  intervals  with  the  Holcombe 
and  Phipps  compound  D  fry  mortality  data  and  the  DeFoe  1,  1,  2,  trichloro- 
ethane  fry  mortality  data. 

First  consider  the  Holcombe  ~r>d  Phipps  compound  D  fry  mortality  data. 
The  comparisons  of  Groups  2,  3,  4  vs  Control,  based  on  the  Poisson  appro¬ 
ximation,  are  quite  similar  to  the  conservative  small  sample  confidence 
intervals  discussed  earlier  in  this  subsection. 


Now  consider  comparisons  of  Groups  5,  6  with  the  Control  group. 
Group  5  vs  Control:  X,.  =  79,  X^  =  6  ,  N,_  =  =  100,  =  a2  =  0.025 


(r  F(14,  158,  T975)  ’  “  F(16°-  12=  ' 975> )  ’(f^94’"1- 


•^(2.77)j  =  (5.82,  36.93) 

is  an  approximate  95%  confidence  interval  on  p,_/p^. 

Group  6  vs  Control:  Xg  =  100,  X^  =  6  ,  Ng  =  =  100,  =  a2  =  0.025 


/100  1 _  10IF(2O2  12-  L_  191 

l  7  F(14,  200;  .975)’  6  F(-202»  12>  *975 >)  \  7  1.79  ’  6 

(2.75)^  =  (7.98,  46.29) 

is  an  approximate  95%  confidence  interval  on  Pg/p^. 

These  intervals  compare  with  the  conservative,  small  sample  intervals 
calculated  earlier  as  follows: 

Conservative,  Small  Sample  Approximate  Poisson 


Group  5  vs  Control 
Group  6  vs  Control 


(5.27,  39.86) 
(6.57,  50) 


These  intervals  are  seen  to  be  quite  similar. 


(5.82,  36.93) 
(7.98,  46.29) 
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We  now  consider  the  DeFoe  1,  1,  2,  trichloroethane  data  and  calculate 
approximate  confidence  intervals  to  compare  Groups  5,  6  to  the  control 
group.  Since  X^  =  0  we  can  only  calculate  lower  confidence  bounds. 

Group  5  vs  Control;  X^  =  9,  X-j_  =  0 ,  =  40,  =  0.05,  =  0. 

Thus 

9  1  9 

_ ± _ _  =  y  =  9  Sft 

1  F (2 ,  20;  .95)  3.49 

is  a  95%  lower  confidence  bound  on  p<_/p^. 

Group  6  vs  Control  X^  =  40,  X^  =0,  =  40  ,  ot^  =  0.05,  =  0. 

Thus 


40  1  _  _40_  _  2  R2 

1  F(2,  82;  .95)  3.12  1 

is  a  95%  lower  confidence  bound  on  p  /p  . 

D  1 

Thus  there  is  strong  statistical  evidence  that  the  response  rates  in 
groups  5  and  6  are  substantially  greater  than  that  in  the  control  group. 
The  response  rate  in  group  5  is  at  least  2^-  times  that  in  the  control 
group .  “ 

D.  Method  3  Poisson  Approximation 

We  now  consider  method  3  for  placing  confidence  intervals  on  ratios 
of  parameters.  This  method  is  based  on  the  Poisson  approximation  to  the 
binomial  distribution  and  so  requires  that  each  p  be  less  than  0.1  or 
that  each  p  be  greater  than  0.9  in  order  that  the  Poisson  approximation 
be  reasonably  accurate.  Operationally,  we  will  use  this  approximation 
if  each  p  is  less  than  0.1  or  if  each  p  is  greater  than  0.9.  The  proto¬ 
type  situation  is 


Control 

Group  2 

Dead 

X1 

X2 

Live 

N1  "  X1 

N2  “  X2 

Nn 

N0 

Let  p  ,  p„  denote  the  response  probabilities  in  groups  1,  2  respectively. 
We  wish  to  construct  1  -  a  confidence  intervals  on  p-/p. . 


Let  =  N-jP^  X2  =  N2p2. 

Then  if  p^<.l,  P2<.1  X^PQ  (A^) ,  X2^P0  (X2) .  We  can  thus  pose  the  problem 
as  one  of  placing  confidence  intervals  on  the  ratio  of  two  Poisson  means. 
If  p^>.9,  p2>.9  it  is  probably  of  more  interest  to  place  a  confidence  in¬ 
terval  on  tne  ratio  q2/q  ,  w^ere  =  1  -  p^,  q2  =  1  -  p2  •  We  are  then 
back  in  the  above  situation. 

Nelson  [  24  ]  shows  that  a  1  -  a  confidence  interval  on  is 


X1  +  1  F(2X1  +  2,  2X2;  1  -  c^)  »  X±  F(2X2  +  2>  2X1;  1  “  V 


where  F(v^,  y)  represents  the  upper  y  point  of  the  F-distribution 


with  d.f.  v2  and  0^+02=  a.  Now 


X2/X1  =  (N2p2)/(N1p1)  =  (N2/N1) (p2/p1) 


Thus  multiplying  the  above  confidence  bounds  by  the  factor  N^/N2  yields 


confidence  bounds  on  p2/p^.  Namely 


X.  +1  F(2X.  +  2,  2X„;  1  -  a.)  N  »  X 


N  X  +  1  N 

»  F  F(2x2  +  2»  2X1;  1  "  ‘VnT 


is  a  1  -  a  confidence  interval  on  p2/p^.  Often  we  take  a^,  a2  to  be 
a/2.  However  for  one  sided  confidence  intervals  we  takea^  =  a,  =  0 
or  =  0,  a2  =  a.  x 

If  X^  =  0  or  if  X2  =  0  we  have  only  one  sided  information  about 
Pf,  p2  respectively.  Thus  we  can  only  construct  one  sided  confidence 
bounds  on  their  ratio.  Namely  If  X^  >  0,  X2  =  0  then  set  the  lower  con¬ 
fidence  bound  equal  to  0  and  upper  confidence  bound  on  p2/p^  becomes 


4  ^  n2-  2xr  1  -  “> 


if  X2  =  0,  X  5 


If  X^  =  0,  X2>0,  then  we  can  only  get  a  lower  bound  on  p2/p^.  Set 
the  upper  bound  equal  to  00  and  the  lower  confidence  bound  becomes 


133 


N, 


2  F(2,  2X2;  1  -  a)  N2 


if  X1  =  0,  X2>0 


If  X  =  0,  X2  =  0  the  problem  is  indeterminate. 

Nelson  [24]  presents  charts  which  facilitate  the  construction  of 
two  sided  90%,  95%  or  99%  confidence  intervals  on  X^/X^.  However  his 
charts  do  not  apply  for  the  situation  when  X^  =  0  or  X2  =  0.  In  fact 

they  effectively  apply  only  when  0. 1<X2/X^<10.  The  charts  are  shown  in 
in  Figures  XIII. 3,  XIII. 4,  XIII. 5. 

To  use  the  Nelson  charts 


1.  Enter  the  value  of  X2/X^  on  t^e  horizontal  axis. 

2.  Go  up  to  the  curve  labelled  with  the  X^  value.  (There  are  two 
sets  of  curves,  corresponding  to  upper  and  lower  confidence 
limits) . 

3.  Read  the  upper  and  lower  limits  on  the  vertical  scale. 

4.  Multiply  the  resulting  limits  by  the  ratio  N^/N2< 

We  illustrate  the  use  of  this  Poisson  based  procedure  on  several  sets  of 
data.  First  we  consider  the  Holcombe  and  Phipps  compound  D  fry  mortality 
data.  We  pool  responses  across  tanks  within  groups. 


CONTROL 

GROUP  2 

GROUP  3 

GROUP  4 

GROUP  5 

GROUP  6 

DEAD 

6 

8 

8 

13 

79 

100 

LIVE 

94 

92 

92 

87 

21 

0 

100 

100 

100 

100 

100 

100 

We  compare  various  treatment  groups  with  the  control  group.  We  will  cal¬ 
culate  two  sided  95  percent,  nonsimultaneous  confidence  intervals.  Groups 
2  and  3  appear  to  have  response  probabilities  around  0.10  and  group  4  does 
not  seem  to  be  too  much  beyond  this  level.  We  thus  stretch  our  criterion 
a  bit  and  calculate  confidence  intervals  to  compare  groups  2,  3,  4  with 
the  control  groups. 

Note  that  we  could  modify  the  confidence  intervals  for  simultaneity 
by  using  Bonferroni’s  inequality. 
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Groups  2,  3  vs  Control: 


X2  =  8,X  =  6,  N2  =  N  =  100,  a  =  a2  =  .025 

Thus 

(7  F(14,  16;  .975)  *  6  F(l8,  12;  ,975)  =  7(2,83)  ,  ^(3.11)^  = 

(0.404,  4.665). 

Group  4  vs  Control; 

=  13  ,  Xx  =  6  ,  N2  =  Nx  =  100  ,  cy  =  a2  =  .025 

Thus 

(r  .975)  >  r F<28- 12=  -”5)  -  ,  r^-^))  - 

(0.76,  6.95) 

Comparing  the  confidence  intervals  obtained  by  methods  1,  2,  3  we 


see  that 

Asymptotic 

Conservative 

Small  Sample 

Poisson 

Approximation 

Group  2,  3 
vs  Control 

(.480,  3.703) 

(.401,  4.592) 

(.404,  4.665) 

Group  4 
vs  Control 

(.857,  5.474) 

(.788,  6.885) 

(.76,  6.95) 

Thus  the  asymptotic  intervals  are  shorter  than  either  of  the  small  sample 
intervals.  The  small  sample  intervals  are  thus  more  conservative. 

We  next  consider  the  DeFoe  compound  C  fry  mortality  data.  We 
again  pool  across  tanks  within  groups. 


CONTROL 

GROUP  2 

GROUP  3 

GROUP  4 

GROUP  5 

GROUP  6 

DEAD 

0 

0 

2 

1 

9 

40 

LIVE 

40 

40 

38 

40 

31 

0 

40 

40 

40 

41 

40 

40 

Since  there  are  zero  responses  in  the  control  group  (i.e.  X1  -  0) ,  we  can 
only  calcualte  lower  confidence  bounds. 
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Group  2  vs  Control:  Since  Group  2  has  0  responses  also,  the  situation 
is  indeterminate. 

Group  3  vs  Control:  Choose  =  .05,  =  0  =  2,  =  0,  =  40 

Then 

1  f(2 ,  4;  .95)  6.94  “  '  * 

is  a  95  percent  lower  confidence  bound  on  p^/p^. 
cal  evidence,  at  the  a  =  .05  level,  that  p^>p^. 

Group  4  vs  Control:  X  =  1,  X^  =  0  =  41,  N 

a2  =  °‘ 

Then 

i  _ 1 _  40  ,  40  _1_  , 

1  F(2,  2;  .95)  41  41  19.0 

is  a  95  percent  confidence  bound  on  p^/p^.  Thus  there  is  no  statistical 
evidence,  at  the  a  =  0.05  level,  that  p^>p^. 

In  general  the  confidence  intervals  that  we  have  calculated  are  too 
wide  to  determine  the  ratios  of  the  various  probabilities  with  much 
precision.  We  must  conclude  that  the  data  are  not  sufficient  to  esti¬ 
mate  these  ratios  very  precisely  without  placing  further  structure  on 
the  problem.  One  way  of  imposing  such  further  structure  will  be  dis¬ 
cussed  in  the  following  sections. 


Thus  there  is  no  statisti 

=  40  Choose  =  .05, 


Figure  XIII. 1  Continued 


Figure  XIII. 2  EXAX2  output  from  calculation  of  exact,  small  sample  confidence  intervals  on  treatment 
control  odds  ratio 
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Figure  XIII. 2  Continued 


Poisson  two  sample  confidence  interval  charts  —  90  percent  (from  Nelson,  1969) 


XIV.  DOSE  RESPONSE  CURVE  ESTIMATION  —  PROBIT  ANALYSIS 


Introduction.  Dose  Response  Fat-imat -i.pn  vs  Hypothesis  Testing 


An  alternative  approach  to  estimating  acceptable  concentration 
levels  is  based  on  fitting  dose  response  models  to  the  data  and 
estimating  that  concentration,  C^,  which  results  in  an  increase 
of  at  most  L  in  the  response  rate  over  and  above  background  level. 
The  dose  response  curve  formulation  is  pictured  schematically  in 
Figure  XIV. 1.  The  problem  of  determining  a  safe  concentration 
has  been  transformed  from  a  testing  problem  (determine  which  re¬ 
sponse  rates  are  significantly  different  than  the  control  rate) 
to  an  estimation  problem  (calculate  a  lower  confidence  bound  on 


The  two  formulations  are  conceptually  different  and  lead  to 
different  implications.  With  the  classical  hypothesis  testing 
formulation  the  larger  and  more  precise  the  experiment  the  more 
powerful  will  be  the  hypothesis  test.  Thus  lower  concentration 
levels  will  be  found  significantly  different  from  the  control 
group  and  so  the  acceptable  concentration  will  be  decreased. 

By  contrast,  with  the  dose  response  curve  estimation  formulation 
the  larger  and  more  precise  the  experiment,  the  higher  will  be 
the  lower  confidence  bound  on  and  so  the  acceptable  concent¬ 
ration  will  be  increased.  This  latter  situation  seems  more 
natural  to  us  for  two  reasons. 


1.  There  is  no  need  to  specify  rigid  sample  size  requirements 
in  the  protocol.  People  could  present  any  level  of  evidence 
regarding  safe  concentrations  that  they  wish.  The  more  ex¬ 
tensive  the  experiment,  the  higher  will  be  the  lower  con¬ 
fidence  bound  on  C  . 

Li 

2.  An  investigator  conducting  toxicity  tests  in  support  of 
petitions  to  the  EPA  for  discharge  permits  is  induced  to 
carry  out  more  extensive  and  more  precise  experimentation 
by  the  economics  of  the  situation.  He  is  rewarded  for  his 
efforts  by  demonstrating  a  greater  safe  concentration. 


OPINION:  We  feel  that  increased  emphasis  should  be  placed  on  the  fitting 
and  use  of  dose  response  curve  models  in  the  design  of  and  analysis  of 
data  from  aquatic  toxicity  tests. 

It  should  be  noted  that  just  because  we  define  in  terms  of  the 
concentration  associated  with  an  increase  in  response  rate  of  L  units 
over  background  does  not  mean  that  we  consider  killing  100L  percent  of 
the  fish  to  be  "acceptable".  No  increased  mortality  is  really  desirable. 
However  by  adopting  this  formulation  we  can  argue  that  we  are  limiting 
our  risk  to  an  upper  bound  on  L.  The  choice  of  L  in  a  particular  situa¬ 
tion  would  of  course  need  to  be  a  biological  and  a  regulatory  decision. 
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We  have  fitted  (or  attempted  to  fit)  a  number  of  dose  response  models 
to  the  embryo  and  fry  mortality  data.  Some  of  these  models  are  standard 
while  others  are  nonstandard.  Among  the  standard  models  fitted  are  the 
probit  model  (Finney  [  11  ])  with  either  logarithmic  or  untransformed  con¬ 
centration  and  the  logit  model  with  either  logarithmic  or  untransformed 
concentration  levels.  Both  of  these  models  classically  account  for  back¬ 
ground  variation  by  means  of  Abbott's  correction.  For  example  a  probit 
model  with  Abbott's  correction  might  state 

p(conc)  =  pQ  +  (1  -  pQ)  $  (B  +  3^  &n  (cone)) 

where  p^,  p(conc)  are  the  response  rates  at  the  control  and  at  cone  re¬ 
spectively,  $(•)  is  the  normal  c.d.f.,  and  p^,  Bq,  3^  are  unknown  param¬ 
eters  to  be  estimated  from  the  model  fit.  Such  a  probit  model  can  easily 
be  fitted  to  the  data  using  SAS  PROC  PROBIT  [  12  ] .  The  1979  version  of 
the  BMDP  package  [  27  ]  contains  a  stepwise  logistic  regression  program. 

Among  nonstandard  dose  response  models  tried  are  a  nonstandard  probit 
type  model  and  a  "nonparametric"  dose  response  model.  The  nonstandard 
probit  type  model  differs  from  the  standard  model  in  the  way  it  handles 
background  response.  One  version  can  be  written  as 

p(conc)  =  $  (otg  +  £n  (cone  +  c)) 

where  p(conc)  is  the  response  rate  at  cone,  c  accounts  for  the  background 
response,  and  a  ,  a  ,  c  are  unknown  parameters  to  be  estimated  from  the 
model  fit.  A  criticism  of  Abbott's  correction  is  that  it  tacitly  assumes 
that  background  related  response  and  toxicant  related  response  are  due  to 
different  and  independent  mechanisms.  The  nonstandard  model  assumes  that 
background  related  responses  and  toxicant  related  responses  are  due  to 
similar  mechanisms  and  thus  that  background  acts  like  an  incremental  to¬ 
xicant  level  c.  Which  (if  either)  model  is  more  appropriate  in  a  given 
situation  depends  on  how  well  they  fit  the  data  and  on  biological  judge¬ 
ment.  The  nonstandard  probit  model  and  a  large  family  of  other  standard 
and  nonstandard  dose  response  models  can  be  fitted  by  the  use  of  nonlinear 
regression  programs  such  as  SAS  PROC  NLIN  [  12  ]  and  BMDP  programs  BMDP3R, 
BMDPAR[  27]  (program  versions  1977  or  later). 

We  have  developed  a  "nonparametric"  dose  response  model  that  differs 
from  the  more  usual  parametric  models  in  a  number  of  ways. 

1.  There  is  no  need  to  make  strong  parametric  assumptions  about  the 
form  of  the  dose  response  model. 

2.  There  is  no  need  to  be  concerned  with  transformations  of  the 
concentration  levels. 

3.  There  is  no  need  to  worry  about  the  parametric  form  used  to 
correct  for  background  variation. 


4.  Exact,  small  sample  theory  is  used  to  construct  conservative 
lower  bounds  on  safe  concentration. 

We  have  developed  a  special  purpose  computer  program  to  carry  out  such 
nonparametric  dose  response  analyses.  It  is  described  in  detail  in 
Feder  and  Sherrill  [  41  ] ,  which  is  included  as  an  appendix  to  section 
XVI. 


We  now  illustrate  inferences  about  safe  concentrations  based  on  the 
various  dose  response  models  discussed  above. 

B.  Probit  Analysis  Using  SAS  PROC  PROBIT 


In  this  subsection  we  fit  probit  models  to  the  fry  mortality 
data  from  the  DeFoe  test  on  compound  C  and  from  the  Holcombe  and 
Phipps  test  on  compound  D.  Although  we  do  not  adjust  the  data 
for  tank  to  tank  heterogeneity,  the  same  analyses  can  be  carried 
out  after  such  adjustments  have  been  made. 

We  first  consider  the  DeFoe  data.  The  basic  data  are  listed 
in  Figure  XIV. 2.  There  are  two  tanks  per  treatment  group.  Con¬ 
centration  values  for  each  group  (in  units  of  yg/liter)  have 
been  determined  as  average  values  over  all  determinations  and 
over  all  tanks  within  each  group.  These  are  denoted  as  CONCMEAN. 
Other  variables  of  importance  are 

DEADESUM  =  //  dead  embryos  in  the  tanks  after  hatch.  (*»fter 

about  5  days). 

DEADFSUM  =  //  dead  fry  after  32  days. 

PRPDEADE,  PRPDEADF  =  proportions  of  dead  embryos  and  fry 

respectively. 

LOGCONC  =  log10(CONC) 

Note  that  the  measured  concentration  in  the  control  group  is  not 
zero  here  and  that  no  fry  mortality  has  occurred  in  the  control 
group.  It  is  unclear  from  preliminary  plots  of  proportions  of 
dead  fry  vs  arithmetic  and  logarithmic  concentration  (not  shown) 
whether  a  probit  model  would  better  be  fitted  to  arithmetic  or 
to  logarithmic  concentration.  We  will  try  both  fits  and 
compare  them. 

We  first  fit  a  standard  probit  model  using  arithmatic  concen¬ 
tration.  The  specific  model  fitted  is 

p(CONC)  =  c  +  (1  -  c)  $(3  -  5  +  B1CONC) 

where  $(•)  is  the  standard  normal  c.d.f.,  8  and  8.  are  unknown 

o  1 
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model  parameters  that  characterize  the  shape  of  the  response 
curve,  and  c  is  the  unknown  model  parameter  that  specifies 
the  background  rate.  (The  quantity  5  in  the  argument  of  $(*)  is 
due  to  probit  convention).  We  fit  this  model  to  the  data  with 
SAS  PROC  PROBIT  by  maximum  likelihood  estimation.  The  output 
resulting  from  this  fit  appears  in  Figures  XIV. 3  -  XIV. 6.  The 
interpretation  of  this  output  is  as  follows: 

T)  Summary  of  the  maximum  likelihood  iteration  process. 

Intercept,  Slope  *-*■  8q,  B^  respectively  in  the  probit  model 

c  *-*■  background  rate,  or  threshold  rate. 

(Note  that  if  c  goes  negative  at  an  iteration  step 
it  is  set  to  0. ) . 

MU, SIGMA  (y,cr)  correspond  to  the  mean  and  standard  deviation 
of  the  dose-response  distribution. 

y,  CT  are  related  to  Bq,  B^  as 
Bo  =  5  -  y/a,  BL  =  1/a 

These  relations  can  be  verified  from  the  entries  given  in 
the  output. 

The  estimated  asymptotic  variance-covariance  matrix  of  (8Q, 
8^,  c) .  This  is  based  on  the  Fisher  information's  inverse. 

The  estimated  asymptotic  variance-covariance  matrix  of  (y, 

(d,  c)  .  This  is  based  on  the  inverse  of  the  estimated  Fisher 
information  matrix. 


© 

© 


Note:  These  estimated  var-cov  martices  are  the  basis  of  the  confidence 
interval  calculations  made  by  the  program.  The  validity  of  these  var-cov 
estimates  depends  on  having  the  true  state  of  nature  and  the  maximum  like 
lihood  estimates  interior  to  the  parameter  space.  In  this  fit  c  =  0.003 
with  an  estimated  standard  error  of  0.03.  We  thus  might  consider  drop¬ 
ping  c  from  the  model. 

(£)  Chi  square  test  for  lack  of  fit  of  the  probit  model. 

Degrees  of  freedom  =  number  of  groups  -  number  of 
parameters  =6-3=3. 


Under  the  null  hypothesis  of  no  lack  of  fit  to  the  model 
this  statistic  has  a  chi  square  distribution  with  3 
d.f. 


148 


A  plot  of  the  fitted  straight  line  in  the  probit  domain, 
with  the  estimated  probits  of  the  dose  response  rates  at 
each  concentration  level  in  the  data  indicated  as  X's  on 
th*'  plot. 


Notes:  1.  Probit  (0)  =  5  +  $  (0)  =  -°°.  However  it  is 

plotted  as  0  because  that  is  the  smallest  value 
used.  Similarly,  Probit  (1)  =  5  +  4>~-^  (1)  =  °°, 
but  it  is  plotted  as  10  because  that  is  the  large¬ 
st  value  used. 


2.  The  observed  thresholds  at  probit  values  of  0 
and  10  seem  far  away  from  the  fitted  line.  The 
standard  errors  of  these  points  are  also  very 
large,  so  these  points  are  discounted  when  dete¬ 
rmining  a  probit  fit.  In  particular, 

Var[probit(p)  ]  =  p(l  -  p) /ncj) ($  1(p))  -*■ 
p  +  0  or  1. 

Thus  these  points  carry  very  little  weight 
straight  line  fit  in  the  probit  domain. 

3.  The  estimated  background  response  rate 
removed  from  the  plot.  Thus  estimates 
increments  over  background. 


Tj  For  various  percentiles  of  the  fitted  dose  response  curve 
~  (after  adjusting  for  background),  the  point  estimates  of 

CONCMEAN  are  given  as  well  as  95%  lower  and  upper  confidence 
bounds  on  these  points. 


©  Plot  of  4(8  -  5  +  8,  CONC)  vs  CONC. 


00  as 


in  the 


has  been 
represent 


Note:  These  percentiles  are  percentages  of  the  population 
responding  due  to  the  toxicant,  after  adjusting  for 
background  effects. 

The  point  estimates  correspond  to  the  percentiles  indicated 
on  the  plots. 


These  lower  confidence  bounds  are  just  the  quantities  needed  to  cal¬ 
culate  confidence  bounds  on  safe  concentrations.  Lower  95%  two  sided 
bounds  correspond  to  lower  97.5%  one  sided  bounds.  Suppose  we  are  will¬ 
ing  to  tolerate  an  increase  in  response  of  10  percent  due  to  toxicant 
causes.  What  is  a  lower  confidence  bound  on  safe  dose? 


Consider  the  dose  response  curve  (adjusted  for  background  rate) . 
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$  1(- 10)  +  5  -  8o 

Now  d  n  =  -  =  32.65  from  Figure  XIV. 6. 

The  lower  97.5  percent  confidence  interval  on  d  ^0  is  ~  45.97.  It  is 
thus  totally  uninformative  due  to  the  gentle  slope  of  the  dose  response 
curve  (0.036)  and  relatively  large  standard  error  of  the  slope  (0.014). 

Note  that  the  confidence  bounds  on  the  percentiles  of  the  dose  response 
curve  are  based  on  Fieller's  theorem.  See  Finney  [11],  section  4.7 
(esp.eqns  (4.37),  (4.38))  for  details. 


We  now  consider  the  chi  square  test  statistic  for  goodness  of  fit 
in  more  detail.  The  chi  square  statistic  can  be  used  for  a  number  of 
purposes.  The  statistic  given  in  (4)  is 


CHI-SQUARE 


6 

> 

— / 

i=l 


u 

-2 


(°i  -  NA)' 

"iMi 


Where 


°i  = 


It  observed  responses  in  the  i-th  treatment 
group 


N^  =  It  fish  in  the  i-th  treatment  group 


Pi  = 


estimated  response  probability  in  the  i-th 
treatment  group. 


qi  = 


i  A 

1  "  Pi- 
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5±  -  8  +  (1  -  8)  *  |(30  -  5)  +  f^d.)  = 


c  +  (1  -  c)  $ 


di  - y 


i  =  1,  ...»  6. 


The  values  of  these  quantities  for  the  6  treatment  groups  are  as  follows: 

/v  2 

(0  -  NP) 


Trt  Gri 

>  0 

N 

j> 

n£ 

n£q 

n£q 

1 

0 

40 

0.0100 

0.3996 

0.3956 

0.4036 

2 

0 

40 

0.0115 

0.4586 

0.4534 

0.4639 

3 

2 

40 

0.0154 

0.6152 

0.6058 

3.1655 

4 

1 

41 

0.0300 

1.2300 

1.1931 

0.0443 

5 

9 

40 

0.2397 

9.5876 

7.2896 

0.0474 

6 

40 

40 

0.9979 

39.9144 

0.0854 

0.0858 

7 

=  4.2105 

We 

see  that  this 

chi  square  statistic  agrees  with  that  calculated 

in  © 

of  the 

PROBIT 

output . 

We  should  break 

out  the  cell  by  cell  con- 

tributions  in 

order 

to  ensure 

that  a  large  value  of  chi 

square  is  not 

due  to  one  or  a  few  cells  with  very  low  expected  frequency.  Just  one 
observed  response  in  such  a  cell  can  inflate  the  chi  square  statistic 
tremendously.  In  our  case  this  does  not  occur. 

Note  that  the  applicability  of  the  asymptotic  chi  square  approxima¬ 
tion  to  the  distribution  of  is  doubtful  here  due  to  the  small 
expected  sample  sizes.  Namely 


V 

V 

i 

1 

2 

3 

4 

5 

6 

» 

m 

NPi 

0.40 

0.46 

0.62 

1.23 

9.59 

39.91 

. 

■ 

Nqi 

39.60 

39.54 

39.38 

39.77 

30.41 

0.09 

Dixon  and  Massey  [  13  ]  page  238  state  that  for  the  approximate  a- 
symptotic  x2  distribution  to  be  close  "the  sample  size  N  must  be  suf¬ 
ficiently  large  that  none  of  the  F^'s  (i.e.  N^£^  or  N^q^)  is  less  than 
1  and  not  more  thr  l  20  per  cent  of  the  F^'s  are  less  than  5."  This 
criterion  is  clearly  not  met  in  the  above  example. 

Since  no  control  group  mortality  was  observed  and  since  the  esti¬ 
mated  background  rate  is  compatible  with  0(6  =  0. 0031,  stderr (c)  =  .0288) 
it  was  decided  to  refit  the  model  specifying  that  c  =  0.  This  simplifi¬ 
cation  will  reduce  the  standard  errors  of  estimates  considerably. 
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The  fit  is  shown  in  Figure  XIV. 7  and  the  associated  confidence  in¬ 
tervals  are  given  in  Figure  XIV. 8.  The  point  estimates  the  slope  and 
intercept  are  seen  to  be  very  similar  to  those  based  on  the  threshold 
model  fit  in  Figure  XIV. 3.  In  particular 


/V 

S1 

A 

C 

S(B0) 

3(Bl) 

o(c) 

Threshold  Model 

2.537 

0.036 

0.003 

0.834 

0.014 

0.029 

No  Threshold  Model  2.616 

0.035 

0  (by 

0.257 

0.006 

0(by  defn.) 

defn. ) 

We  see  that  the 

point  estimates  of 

^o»  ^1» 

c  have 

not  changed  by  much, 

but  the  standard  errors  have  decreased  markedly.  Thus  if  there  is  no 
statistical  evidence  of  background  mortality  we  should  eliminate  it  from 
the  model  to  increase  estimation  precision. 

Let's  see  how  this  affects  the  percentile  point  estimates  and  lower 
confidence  bounds  on  them. 

Threshold  Fit  No  Threshold  Fit 


Point 

Percentile  Estimate 

Lower  97.5  Per¬ 
cent  Confiden¬ 
ce  Bound 

Lower  97.5  Per¬ 
cent  Confidence 
Point  Estimate  Bound 

1 

3.773 

-157.377 

1.657 

-17.646 

3 

16.086 

-109.518 

14.388 

0.000 

5 

22.606 

-  84.331 

21.130 

8.900 

10 

32.647 

-  45.967 

31.511 

21.622 

15 

39.421 

-  20.657 

38.515 

29.364 

20 

44.805 

-  1.246 

44.082 

35.034 

30 

53.572 

27.152 

53.146 

43.539 

50 

68.064 

54.847 

68.130 

56.372 

70 

82.557 

67.675 

83.115 

68.459 

80 

91.323 

73.775 

92.179 

75.594 

We  see  that  the  point  estimates  under  the  nonthreshold  fit  are 
slightly  lower  than  the  point  estimates  under  the  threshold  fit  in  the 
lower  portion  of  the  curve. 


However  the  increased  precision  of  estimation  under  the  nonthresh- 
hold  fit  results  in  substantial  increases  in  the  lower  confidence  bounds. 


Suggestion : 


1.  If  there  is  no  observed  response  in  the  control  group. 
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and  2.  If  the  control  group  response  rate,  as  estimated  from  the 

dose  response  fit  -  including  threshold  -  is  nonsignificant. 

and  3.  If  there  is  no  a  priori  reason  to  expect  background  rate 

then,  eliminate  background  threshold  parameter  from  the  model. 

This  raises  in  a  conjecture:  Suppose  we  fit  a  nonthreshold  model  even 

when  a  non  zero  background  rate  exists.  We  conjecture  that  the  point 
estimates  of  nonthreshold  response  rates  will  be  estimates  of  quantities 
lower  than  the  true  response  rates.  However  the  increased  precisions  of 
these  estimates  may  well  result  in  more  accurate  lower  confidence  bounds 
on  the  "true"  response  percentiles.  This  is  a  bias-variance  trade  off. 

It  is  interesting  to  note  that  Charles  Stephan  [  42  ] ,  page  78  ff 
discusses  Abbott's  correction  in  connection  with  the  estimation  of  LC- 
50  concentration  in  acute  toxicity  tests.  He  comments  "...Abbott's 
formula  ...  is  a  statistically  sound  way  of  correcting  for  control  mort¬ 
ality  if,  and  only  if,  the  cause  of  the  control  mortality  does  not  make 
the  rest  of  the  test  organisms  more  susceptible  to  the  toxicant.  This 
assumption  is  usually  questionable  in  acute  mortality  tests  with  aquatic 
animals.  ...  If  control  mortality  is  below  a  specified  minimum... it  should 
be  reported  along  with  the  results  of  the  test,  but  correction  of  the 
LC50  for  this  mortality  would  seem  to  be  a  meaningless  exercise.  ..." 

It  is  interesting  that  we  arrive  at  a  similar  suggestion,  based  on  dif¬ 
ferent  reasoning.  Our  motivation  is  a  bias-variance  tradeoff. 

The  previous  PROC  PROBIT  analyses  on  the  DeFoe  data  treated  concent¬ 
ration  without  any  transformation.  We  also  tried  to  fit  a  probit  model 
using  log  concentration.  Folklore  states  that  a  probit  or  logit  fit 
will  better  fit  the  response  vs  logarithmic  concentration  relation  than 
the  response  vs  arithmetic  concentration  relation. 

Finney  [  11  ]  page  8-13  recommends  using  log  concentration.  Stephan 
[  42  ]  also  recommends  the  use  of  a  logarithmic  transformation  of 
concentration  on  a  routine  basis. 

Finney,  pages  9ff  states  "The  frequency  distribution  of 
tolerances,  as  measured  on  the  natural  scale  (i.e.  arithmetic 
scale  -  P.F.)  is  usually  markedly  skew,  but  often  a  simple 
transformation  of  the  scale  of  measurement  will  convert  it  to 
a  distribution  approximately  of  the  familiar  Gaussian  or  nor¬ 
mal  form  ...  normalization  can  often  ,e  effected  by  expressing 
the  tolerances  in  terms  of  the  logarithms  of  the  concentrations 
instead  of  the  absolute  vlues.  Indeed  this  transformation  is 
now  standard  practice  ...  the  justification  is  the  widespread 
applicability  of  the  normal  distribution  as  an  adequate  appro¬ 
ximation  to  the  truth.  ..." 
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Stephan,  page  75,  states  "...Whenever  any  method  is  used  to  analyze 
concentration  mortality  data,  whether  or  not  a  transformation  such  as 
probit,  logit,  or  angle  is  used  on  the  mortality  data,  the  logarithmic 
transformation  should  probably  be  used  on  the  concentration  data.  All 
of  the  methods  assume  that  the  concentration-mortality  curve  is  linear, 
and  it  seems  to  be  generally  accepted  that  the  curve  is  more  likely 
to  be  linear  if  log  concentration  is  used.  ..." 

We  show  by  example  of  the  DeFoe  1,  1,  2,  trichlc  .'oethane  data  that 
the  logarithmic  transformation  of  concentration  provides  a  much  poorer 
fit  of  the  probit  model  than  does  arithmetic  concentration.  The  moral 
is  that  each  time  we  fit  a  probit,  logit,  or  other  dose  response  model 
we  should  have  an  open  mind  as  to  using  untransformed  concentration, 
logarithmic  concentration,  or  some  other  function  of  concentration.  We 
should  transform  concentration  in  a  manner  suitable  for  each  individual 
data  set. 

We  first  tried  to  fit  the  probit  model  with  background  response  to 
logarithmic  concentration.  The  attempted  fit  would  not  converge.  To 
improve  convergence  performance  we  refitted  the  DeFoe  data  with  logari¬ 
thmic  concentration  using  a  specified  background  rate  of  0  (the  obser¬ 
ved  level) .  The  output  appears  in  Figures  XIV. 9.  The  maximum  like¬ 
lihood  algorithm  converges,  however  the  resulting  probit  model  is  not 
an  adequate  fit  to  the  data  as  indicated  by  the  highly  significant  re¬ 
sidual  chi  square  statistic.  (Chi  square  =  47.6774  with  4d.f.) 

We  break  out  the  components  of  the  chi  square  statistic  by  group,  as 
discussed  previously,  to  determine  whether  the  very  large  chi  square 
value  is  due  just  to  one  or  two  components  with  very  small  expected  re¬ 
sponses  but  with  one  or  two  observed  responses.  Such  components  can 
greatly  distort  the  overall  chi  square. 
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40 
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4 

1 

41 
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9 

40 
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2.1673 
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40 
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0.884 

35.36 
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We  see  the  third  treatment  group  contributes  the  most  to  the  statis¬ 
tic.  It  has  a  very  small  expected  frequency  and  two  observed  responses. 

If  this  were  the  only  ^arge  deviation  between  data  and  model,  we  might 
be  inclined  to  consider  the  possibility  of  a  reasonable  probit  fit  with 
the  exception  of  an  outlier  group.  However,  even  if  we  disregard  this  groun, 
the  components  of  chi  square  from  the  remaining  cells  sum  to  12.144  with 
3d.f.  This  value  is  still  significant  at  the  0.01  level,  even  after  the 
largest  component  has  been  deleted.  We  thus  conclude  that  the  model 
does  not  fit  the  data  well.  This,  coupled  with  probit  plots  suggests 
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the  inappropriateness  of  the  probit  fit  after  a  logarithmic  transform¬ 
ation  of  concentration.  Ths  probit  fit  to  untransformed  concentration 
is  superior  in  this  case. 

We  now  consider  the  Holcombe  and  Phipps  compound  D  data.  A  listing 
of  these  data  is  contained  in  Figure  XIV. 10.  The  variable  names  corre¬ 
spond  to  those  of  the  DeFoe  data.  The  test  consisted  of  six  groups 
(control  +  5  toxicant  concentrations)  and  four  tanks  per  group.  Note 
that  the  control  group  concentration  is  0  and  there  is  a  nonzero  thresh¬ 
old  response  rate. 

Probit  models  were  fitted  to  the  fry  mortality  data  after  pooling 
tanks  within  concentration  groups.  SAS  PROC  PROBIT  was  use  to  fit  pro¬ 
bit  models  both  to  concentration  and  to  log^Cconcentration)  .  These  fits 
included  background  effects,  to  be  fitted  by  maximum  likelihood. 

The  probit  fit  vs  untransf ormed  concentration  appears  in  Figures 
XIV. 11,  XIV. 12.  The  residual  chi  square  statistic  is  quite  small  (0.3361 
with  3d.f.)  signifying  a  good  fit  to  the  data.  Figure  XIV. 12  contains 
the  estimated  percentiles  of  the  probit  response  curve,  adjusted  for 
background ,  along  with  lower  and  upper  95  percent  confidence  limits  cal¬ 
culated  by  use  of  Fieller's  Theorem.  For  example  for  the  10th  percentile 
the  estimate  for  C  ^  is  78.77  while  a  lower  97.5  percent  confidence 
bound  is  58.72. 

One  difference  between  the  fits  to  the  DeFoe  and  to  the  Holcombe  and 
Phipps  data  should  be  noted.  In  the  DeFoe  data  no  mortality  was  observed 
in  the  control  group  and  the  threshold  response  rate  was  estimated  to  be 
c  =  0.003  with  an  asymptotic  standard  error  of  0.029.  Thus  there  was  no 
evidence  of  background  mortality  and  we  markedly  improved  precision  of 
the  fit  by  deleting  the  background  correction. 

In  the  case  of  the  Holcombe  and  Phipps  compound  D  data  we  observe  X  =  6 
deaths  within  the  control  group,  with  each  of  the  4  tanks  exhibiting  at 
least  one  death.  Thus  we  know  that  there  is  background  variation.  From 
our  probit  fit  with  arithmetic  dosage  we  estimate  c  =  0.0718  with  an  a- 
symptotic  standard  error  of  0.016,  Thus  c  is  4.5  asymptotic  standard 
deviations  from  0  and  so  is  highly  statistically  significant. 

We  now  fit  a  probit  model  to  the  same  data  using  log^Q  (concentration)  . 
The  estimated  parameter  values,  their  estimated  asymptotic  variance-c.o- 
variance  matrix,  and  the  residual  chi  square  statistic  appear  in  Figure 
XIV. 13.  The  residual  chi  square  statistic  is  0.5046  with  3d.f.,  which 
is  very  small,  thus  indicating  a  good  fit  to  the  data1.  We  thus  have 


Note  that  the  chi  square  value  0.2287,  given  by  SAS  in  Figr  ;e  XIV. 
13  is  incorrect  in  this  case.  It  seems  to  be  omitting  the  control  group 
contribution  to  the  chi  square  statistic.  This  problem  has  been  brought 
to  the  attention  of  the  program  developer  and  has  been  corrected  in  later 
versions  of  the  program. 
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good  probit  fits  to  the  data  using  both  arithmetic  and  logarithmic  con¬ 
centration. 


The  estimated  oackground  level  is  c  =  0.0738  with  a  standard  error  of 
0.0156.  Thus  c  is  4.73  standard  deviations  from  0  and  so  is  highly  sta¬ 
tistically  significant. 

The  estimated  percentiles  (after  adjusting  for  control  group  mortality) 
and  95  percent  confidence  intervals  (by  Fieller's  theorem)  are  shown  in 
Figure  XIV. 14.  The  lower  confidence  bound  is  a  97.5  percent  one  sided 
bound.  This  display  is  analogous  to  that  in  Figure  XIV. 12. 


We  have  thus  fitted  two  distinct  models  which  seem  to  fit  the  data 
well:  the  probit  model  with  arithmetic  concentration  (Figure  XIV. 11) 

and  the  probit  model  with  logarithmic  concentration  (Figure  XIV. 13). 

The  parameter  estimates  associated  with  these  two  fits  are  somewhat  dif¬ 
ferent.  Namely 


Arithmetic  (StdErr) 

M  109.916  (3.638) 
a  24.300  (3.861) 
c  0.0718(0.016) 


Logarithmic  (StdErr) 

2.028  (0.016) 
0.1044(0.015) 
0.0738(0.016) 


We  see  that  the  estimated  background  levels  are  somewhat  similar,  how¬ 
ever  the  estimates  jj,  a  are  very  different. 


Even  though  the  parameter  estimates  differ  considerably,  tne  model 
fits  may  still  be  very  similar.  We  compare  the  estimated  response  dis¬ 
tribution  percentiles  and  associated  lower  confidence  bounds  in  Figures 
XIV. 12,  XIV. 14  for  the  arithmetic  and  logarithmic  concentration  fits 


respectively. 

These 

are  shown 

below 

• 

Arithmetic 

(Figure 

XIV. 12) 

Logarithmic 

(Figure  XIV. 14) 
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5% 
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22.40 

60.97 
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60.01 

31.92 

65.10 
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64.21 
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67.87 
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5 

69.95 

46.15 

71.83 

57.47 

10 

78.77 

58.72 

78.38 

64.83 

15 

84.73 

67.13 

83.14 

70.29 
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87.12 

74.92 

30 

97.17 

84.43 
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The  point 

estimates  of  the 

response  distribution 

percentiles 

corres 

ponding  to  the  arithmetic  and  logarithmic  fits  are  quite  similar  beyond 
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the  third  percentile.  However  there  is  considerable  discrepancy  between 
corresponding  lower  confidence  bounds  on  safe  dose,  based  on  each  of  the 
two  fits  —  below  the  10th  percentile!  Below  the  third  percentile  the 
discrepancy  is  fifty  percent  or  more. 

How  can  we  choose  between  the  two  fits? 

1.  Prior  knowledge  or  mechanistic  information 

None  here.  Since  the  probit  model  is  an  empirical  model,  no  much 
in  the  way  of  mechanistic  arguments  will  distinguish  between  the  two 
fits . 

2.  Magnitude  of  residual  chi  square 

Arithmetic  fit  Residual  chi  square  =  0.336  with  3d.f. 

(a  =  0.953) 

Logarithmic  fit  Residual  chi  square  =  0.505  with  3  d.f. 

(a  =  0.918) 

Both  chi  square  values  are  quite  small  and  the  question  of  which  one  is 
larger  is  probably  just  a  matter  of  chance  fluctuations.  Therefore  we 
should  not  use  these  two  statistically  insignificant  values  to  distinguish 
between  the  fits. 

3.  Appearances  of  plots  of  predicted  and  observed  responses 

Scatterplots  of  predicted  and  observed  responses  vs  arithmetic 
concentration  are  shown  in  Figures  XIV. 15,  XIV. 16.  Similar  plots  vs 
logarithmic  concentration  are  shown  in  Figures  XIV. 17,  XIV. 18.  The 
probit  plots  (Figures  XIV. 15,  XIV. 17)  indicate  greater  discrepancies  be¬ 
tween  observed  and  predicted  responses  (after  adjusting  both  for  back¬ 
ground)  at  the  low  percentiles  of  the  logarithmic  concentration  fit  than 
of  the  arithmetic  concentration  fit.  Similarly  at  the  highest  treatment 
group.  Thus  the  arithmetic  concentration  fit  seems  to  be  a  (slightly) 
better  approximation  to  the  data  at  the  low  percentiles  than  does  the 
logarithmic  concentration  fit. 

4.  Conservativeness .  Below  the  25th  percentile  the  lower  confidence 
bounds  based  on  the  arithmetic  concentration  fit  are  lower  than  those 
based  on  the  logarithmic  concentration  fit.  The  discrepancy  is  especi¬ 
ally  noticeable  for  the  low  percentiles,  in  particular  below  the  10th 
percentile.  Above  the  10th  percentile  both  lower  bounds  are  similar. 

Thus  the  arithmetic  concentration  fit  seems  to  be  more  conservative  than 
the  logarithmic  concentration  fit  at  the  low  percentile. 


Opinion  I  would  prefer  the  arithmetic  concentration  fit  in  this  case. 
However  further  experimentation  at  the  low  concentrations  would  be  needed 
to  distinguish  between  the  differing  conclusions  at  the  low  percentiles. 


Alternative  analysis  ignoring  background 

We  remarked  with  respect  to  the  analysis  of  the  DeFoe  data  that  an 
alternative  way  of  handling  background  mortality  is  to  ignore  it.  The 
hope  is  that  the  improved  precision  will  offset  the  downward  bias  and 
result  in  higher  values  for  lower  confidence  bounds  on  safe  dose. 

Note:  If  the  control  group  response  rate  is  significantly  different 
from  0,  as  is  the  case  with  the  Holcombe  and  Phipps  data,  we  would  not 
expect  the  dose  response  fit  ignoring  background  to  be  a  good  fit  to  the 
data.  Probit  models  ignoring  background  response,  fitted  to  both  arith¬ 
metic  and  logarithmic  concentrations,  show  large  and  highly  significant 
residual  chi  square  statistics  28.96  and  65.03  respectively  with  4d.f. 
The  plots  of  fitted  and  observed  responses  vs  concentration  also  show 
discrepancies. 

We  now  compare  the  percentile  point  estimates  and  their  lower  confidence 
bounds  based  on  the  fit  ignoring  background  response  with  those  based  on 
the  fit  including  background  response.  Comparisons  pertain  to  the  fit 
vs  untransformed  concentration. 


Background  Included  Background  Excluded 


Percentile 

Point 

Estimate 

Lower  97.5% 
Conf .  Bound 

Point 

Estimate 

Lower  97.5% 
Conf.  Bound 

1 

53.39 

22.403 

-7.48 

-24.234 

3 

64.21 

37.946 

13.37 

0.648 

5 

69.95 

46.147 

24.41 

13.197 

7 

74.05 

52.006 

32.33 

21.991 

10 

78.77 

58.715 

41.42 

32.626 

15 

84.73 

67.135 

52.89 

45.860 

20 

89.46 

73.775 

62.01 

56.369 

25 

93.53 

79.421 

69.83 

64.178 

50 

109.92 

101.288 

101.40 

97.761 

We  see  that  in  this  example,  with  the  background  level  many  standard 
deviations  from  0,  the  bias-variance  trade  off  is  such  that  it  does  not 
pay  to  reduce  the  assumed  background  response  level  to  0  in  order  to 
lessen  the  standard  deviation. 
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Figure  XIV. 3  Output  from  PROC  PROBIT  fit  to  DeFoe  fry  mortality  data  —  arithmetic  concentration 
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Figure  XIV. 4  Output  from  PROC  PROBIT  fit  to  DeFoe  fry  mortality  data  —  arithmetic  concentration 


Figure  XIV. 5  Output  from  PROC  PROBIT  fit  to  DeFoe  fry  mortality  data  —  arithmetic  concentration 
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Figure  XIV. 8  PROC  PROBIT  fit  to  DeFoe  fry  mortality  data  —  no  background  correction 
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Figure  XIV. 12  Output  from  PROC  PROBIT  fit  to  Holcombe  and  Phipps  fry  mortality  data  --  arithmetic 
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Figure  XIV. 14  Output  from  PROC  PROBIT  fit  to  Holcombe  and  Phipps  fry  mortality  data  —  logarithmic 


Figure  XIV. 15  Plots  of  observed  and  predicted  based  on  probit  fit  —  arithmetic  concentration 


Figure  XIV. 18  Plots  of  observed  and  predicted  based  on  probit  fit  —  logarithmic  concentration 


DOSE  RESPONSE  CURVE  ESTIMATION  —  MAXIMUM  LIKELIHOOD  ESTIMATION  BY 
NONLINEAR  LEAST  SQUARES  REGRESSION 


We  have  seen  in  the  previous  section  how  standard  dose  response 
models  can  be  fitted  to  the  data  by  use  of  SAS  PROC  PROBIT.  This  pro¬ 
cedure  fits  a  probit  model  to  the  data,  with  several  possible  variations. 
Namely  it  fits  the  model 

p(x)  =  c  +  (1  -  c)$(B0  +  Bjx) 

where  x  =  concentration  or  log  (concentration),  c  is  the  background  rate, 
and  B  ,  are  unknown  parameters  to  be  estimated  from  the  data.  p(x)  is 
the  response  probability  corresponding  to  x.  The  value  of  c  may  be  known 
or  unknown.  Estimation  is  done  by  maximum  likelihood,  based  on  binomial 
theory. 


Jennrich  and  Moore  [  43  ]  show  that  for  distributions  in  the  ex¬ 
ponential  family,  maximum  likelihood  calculations  can  be  carried  out  by 
means  of  nonlinear  least  squares  regression  calculations.  This  applies , 
in  particular  to  models  based  on  the  binomial  distribution. 

Thus  dose  response  curves  can  be  fitted  to  the  data  by  use  of 
appropriate  nonlinear  regression  programs.  Both  SAS  [  12  ]  (PROC  NLIN) 
and  BMDP  [  27  ]  (P3R  and  PAR)  contain  nonlinear  regression  programs  that 
can  carry  out  these  calculations.  See  Jennrich  and  Moore  [  43  ]  for  a 
discussion  of  the  theory  underlying  the  relation  between  maximum  likeli¬ 
hood  estimation  and  nonlinear  regression  in  the  exponential  family.  We 
illustrate  the  methodology  with  the  use  of  SAS  PROC  NLIN.  However  any 
nonlinear  regression  program  with  capability  to  carry  out  iteratively 
reweighted  least  squares  (i.e.  allow  weights  to  be  functions  of  the  model 
paramters)  would  suffice. 

SAS  PROC  PROBIT  also  calculates  lower  and  upper  confidence  bounds 
on  concentration  values  corresponding  to  various  response  curve  percen¬ 
tile  (after  adjusting  for  background),  by  use  of  Fieller's  theorem  (Fin¬ 
ney  [  11  ]  pp  78-79) .  We  illustrate  how  these  confidence  bounds  can  be 
calculated,  based  on  the  parameter  estimates  of  the  fit  and  their  asym¬ 
ptotic  variance-covariance  matrix. 

Before  discussing  the  details  of  fitting  dose  response  curves  by 
means  of  nonlinear  regression  programs.  We  discuss  some  of  the  reasons 
that  one  might  wish  to  do  this. 

1.  The  data  analyst  may  have  a  general  purpose  nonlinear  regression 
program  available  but  no  special  purpose  dose  response  estimation 
program.  Thus  the  general  tool  can  be  used  without  modification 
instead  of  having  to  write  a  special  purpose  program. 


2.  A  very  wide  variety  of  models  can  be  fitted  to  the  data  by  use 
of  general  nonlinear  regression.  PROC  PROBIT  is  rather  limited 
in  the  extent  of  models  it  will  fit.  Namely  it  will  fit  only  a 
probit  model  using  concentration  or  log  concentration.  It  will 
adjust  for  background  effects  only  by  Abbott's  correction. 

We  may  wish  to  fit  models  other  than  the  probit,  e.g.  the 
logit  model,  or  even  more  complex  models  that  incorporate  both 
the  probit  and  logit  models  as  special  cases.  Background  effects 
might  be  modelled  as  additive  concentrations  rather  than  by 
Abbott's  correction.  Namely, 

p(x)  =  $(gQ  +  g^og  (c  +  cq)) 

where  cQ,  the  background  effect,  represents  an  alternative  way 
of  accounting  for  background.  Such  a  model,  although  non¬ 
standard,  can  easily  be  fitted  by  nonlinear  regression  techniques. 
Also,  transformations  of  concentration  levels  other  than  the 
logarithmic  are  useful.  For  example  the  square  root  transformation. 

3.  PROC  PROBIT  automatically  inflates  variance  and  covariance  estimates 
estimates  and  confidence  interval  bounds  by  heterogeneity  factors 
whenever  the  probit  model  does  not  fit  the  data  (as  determined 

by  the  residual  chi  square  statistic) .  This  is  not  always  what 
we  wish  to  do.  PROC  NLIN  does  not  inflate  variance  estimates  by 
heterogeneity  factors. 

4.  We  can  calculate  and  save  predicted  and  residual  values  and  thus 
easily  construct  residual  plots. 

It  should  be  noted  that  PROC  NLIN  will  not  compute  confidence  in¬ 
tervals  on  response  curve  distribution  percentiles  by  use  of  Fieller's 
Theorem,  as  PROC  PROBIT  does.  However  we  show  in  the  subsequent  dis¬ 
cussion  how  we  can  carry  out  these  calculations  fairly  easily,  using 
either  a  hand  calculator  or  a  small  computer  program,  once  the  para¬ 
meter  estimates  and  their  asymptotic  variances  and  covariances  have  been 
determined. 

We  now  consider  three  examples  of  fitting  dose  response  models 
to  fry  mortality  data  by  use  of  SAS  PROC  NLIN.  We  use  the  Holcombe  and 
Phipps  fry  mortality  data  in  all  the  examples.  The  models  fitted  are: 

p(conc)  =  c  +  c$(Bq  +  g^conc) 

,  .  -  8o  +  eicon<7  (,  .  B0  +  6lconc) 

p(conc)  =  c+ce  /  \1  +  e  / 

p(conc)  =  +  8, log, _(conc  +  c)) 


All  the  models  are  fitted  based  on  binomial  distribution  theory  after 
pooling  data  across  tanks  within  groups.  This  is  the  way  that  PROC  PRO¬ 
BIT  fits  models  and  is  appropriate  if  there  is  no  tank  to  tank  hetero¬ 
geneity  within  groups.  In  the  presence  of  tank  to  tank  heterogeneity  we 
can  first  adjust  the  data  by  an  adjustment  factor  and  then  pool  across 
tanks  within  groups. 

The  first  model  fit  is  a  repeat  of  a  model  we  fitted  by  PROC  PRO¬ 
BIT  and  serves  to  verify  that  we  can  duplicate  the  PROC  PROBIT  fits  by 
nonlinear  regression.  The  second  model  is  a  logit  model  and  illustrates 
that  we  can  fit  alternative  models  with  PROC  NLIN  and  that  the  probit  and 
logit  model  fits  result  in  very  similar  inferences. 

The  third  model  fit  treats  the  background  as  an  additive  concent¬ 
ration  rather  than  adjusting  for  it  by  Abbott's  correction.  This  sort  of 
model  would  be  appropriate  if  the  mechanism  of  response  due  to  background 
sources  is  the  same  as  the  mechanism  of  response  due  to  the  substance 
under  test. 

We  now  discuss  the  formulation  of  fitting  dose  response  curves 
by  means  of  nonlinear  regression  techniques. 

Suppose  that  there  are  I  concentration  groups  (both  control  and 
treatment)  and  that  the  1-th  group  contains  N.^  subjects  and  has  re¬ 
sponses.  Let  p^($)  denote  the  response  probability  within  the  i-th 
group.  Then  'V*  Binomial  (N^,  P^OjO)* 

The  form  of  p^(^)  is  specified  by  the  form  of  the  dose  response 
model.  For  example  in  model  1,  P^OjO  =  c  +  c$(3q  +  g^conc^) ,  where 

^  S  (3g,  g^,  c)  is  the  unknown  parameter  vector,  to  be  estimated  by 
least  squares. 

^  Under  the  model  assumptions  X^  has  meen  y^(0)  and  variance 

a.  (0) ,  where 

X  'V 

-  NlPi0J)(l  -  Pl(«)). 

The  nonlinear  regression  procedure  optimizes  the  function 
I 

q  <$>  =  y  «i  -  p^Soy 

i=l 

where  W-,(6)  -  l/a?(0).  Jennrich  and  Moore  [43]  show  that  optimizing  Q(8) 
■L  %  i  %  'v- 

by  the  Gauss-Newton  method  is  equivalent  to  fitting  the  dose  response 


curve  by  maximum  likelihood  estimation. 


We  fit  the  models  to  the  Holcombe  and  Phipps  compound  D  data. 

We  first  consider,  p^B)  =  c  +  c$(B0  +  B^conc^ .  The  results  of 
fitting  this  model  in  the  standard  manner  with  PROC  PROBIT  are  shown 
in  Figure  XIV. 11. 

We  first  discuss  the  NLIN  commands  needed  to  produce  the  output. 
See  the  SAS  79  manual  [12],  pp  317-329,  for  further  details. 


The  model  is 


y±(£)  =  N^c  +  c$(30  +  3^)]  =  N±Pi(£) 

where  (3q •  3^,  c)  =  ^  are  to  be  estimated  by  weighted  least  squares.  The 
weights  are  w±  =  w^)  =  l/[N1pi(^)(i  -  p .  (£)  ] .  The  fitting  algorithm 
also  uses  the  derivatives  of  the  mean  value  function.  These# are: 

3u/3B0  =  +  Bjd±) 

3y/33L  =  N±cd±4)  (3q  +  3xd±) 

% 

3y/3c  =  N±[l  -  $(3q  +  3id±) ] 

where  <Kx) ,  $(x)  represent  the  standard  normal  probability  density  func¬ 
tion  and  cumulative  distribution  function  respectively. 

The  SAS  commands  needed  to  generate  this  fit  are  given  below 

1  PROC  MIN'  &EST=20~METhDU  =  GAUSS;  . ~ " 

2  PARAMETERS  30  =-5 . 0  TO  -4.0  BY  0.25 

51=0.0  TO  0.10  8Y  0.025 
C=  0. 03  TO  C. 11  BY  0. 02 : 

3  BOUND- 0<  =  C<=TT0‘;  -  - - 

ARG=90fBl*CQNCMEAN; 

...  ARG=MAX(  ARG.-5.0)  ; 

ARG=MIN( ARG.5.0)  J 

B I GP  Fil *P R 0£ NO RMT aTTCT; - - " -  - 

SMLP Hl=0 • 39  89 *E XP { -0 • 5*  AR G**2  )  ; 

PR0B=C*(  1.0-C  )  *  B IGPHI  ; 


4- MODEL  DEADFSUM=PRCB»FRY$UM;  _ 
<  D  tR  .  B”0=(  1. 0  -CT*  SM  L  P  H I* FR  Y  SUM  ; 
DER.  Bl»i  1  •C-C  )*CONCMiEAN*SMLPH: 


DER.Bl=( 1 .C-C )*C0NCMEAN»SMLPH1»FRYSUM; 

DER.C=(1.0-BIGPHI )*FRYSUM : 

A  OUTPUT  CUT=HUlPm1A_PRED1CTE0^PRDFSM  resi  DUA  l=rsdfsm; 

7 .WEIGHT  =i;o/(FRYSUM»PRUb*Tl.O-PRUb ))  ;  ' 

TITLE2  FRO&IT  MODEL  FIT  WITH  a8S0TT"S  CORRECTION —  ONTRANSFGRM6D  CONCENTRATi: 
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Line  1  instructs  NLIN  to  fit  by  the  Gauss-Newton  method.  NLIN 
can  also  fit  models  by  use  of  the  method  of  steepest  descent  or  the 
Marquardt  method.  The  Marquardt  method  is  a  compromise  between  Gauss- 
Newton  and  steepest  descent.  Sometimes  near  the  optimum,  steepest  de¬ 
scent  or  Marquardt  methods  take  smaller  steps  and  in  different  direc¬ 
tions  than  the  Gauss-Newton  method  and  so  are  less  prone  to  overshoot  the 
optimum.  Thus  one  optimization  method  will  sometimes  produce  conver¬ 
gence  when  another  one  does  not.  The  BEST  =  20  command  instructs  NLIN 
to  print  out  the  locations  and  error  sums  of  squares  values  that  it  cal¬ 
culates  in  the  preliminary  grid  search  to  determine  starting  values  for 
the  iterative  portion  of  the  search. 

Line  2  specifies  parameter  values  and/or  ranges  of  parameter 
values  that  NLIN  should  use  for  a  preliminary  grid  search  to  arrive  at 
starting  values  for  the  iterative  phase. 

Line  3  specifies  bounds  on  the  parameters.  If  the  parameters 
exceed  these  bounds  at  any  time  during  the  iterative  process  they  are 
forced  back  into  the  permissible  region. 

Line  4  contains  the  model  specification.  The  variable  DEADFSUM 
represents  the  mortality  within  the  i-th  group.  FRYSUM  is  the  total 
number  of  fish  exposed  (pooled  over  tanks  within  groups)  and  PROB  is  the 
response  probability  in  the  i-th  group.  The  SAS  program  statements  be¬ 
tween  lines  3  and  4  are  used  in  the  specification  of  the  model  in  line  4. 

Line  5  contains  expressions  for  the  derivatives  9]j/9Bq,  9^/93^, 
9p/9c  respectively. 

Line  6  specifies  that  the  predictions  and  residuals  from  the  fit 
be  calculated  and  saved  for  future  use. 

Line  7  specifies  the  weights  that  are  to  be  used  in  the  weighted 
least  squares  fit.  Note  that  these  weights  are  functions  of  the  model 
parameters  (through  PROB).  They  are  updated  following  each  iteration. 

The  output  from  these  commands  appears  in  Figures  XV. 1  to  XV. 4. 
Figure  XV. 1  contains  a  listing  of  the  Holcombe  and  Phipps  data.  Figure 
XV. 2  contains  a  summary  of  the  parameter  values  and  residual  sums  of 
squares  associated  with  the  20  best  points  in  the  preliminary  grid 
search.  The  point  with  smallest  weighted  residual  sum  of  squares  is 
used  to  start  the  iterative  Gauss-Newton  search  procedure.  The  results 
of  the  Gauss-Newton  iteration  are  summarized  in  Figure  XV. 3.  It  conver¬ 
ges  after  8  steps.  Figures  XV. 4  contains  statistics  based  on  the  model 
converged  to  in  Figure  XV. 3.  The  upper  portion  of  the  page  contains  an 
analysis  of  variance  table  based  on  weighted  sums  of  squares.  The  middle 
portion  of  the  page  contains  parameter  estimates  and  asymptotic  standard 
errors.  The  bottom  of  the  page  contains  the  asymptotic  correlation  mat¬ 
rix  among  the  parameter  estimates.  We  compare  these  results  with  those 
in  Figure  XIV. 11. 


Several  points  need  to  be  remembered  in  making  the  comparison. 

We  are  fitting  the  model  p(£)  =  c  +  c<J>(8q  +  B^onc)  whereas  PROC 

PROBIT  parameterizes  the  model  as  p(^)  =  c  +  c$>((Bo“  5)  +  B-^conc) . 
Thus  the  estimates  B^,  6  in  the  two  fits  should  agree  while  the 
PROC  NLIN  estimate  of  0q  should  be  5  smaller  than  the  correspond¬ 
ing  PROC  PROBIT  estimate.  Comparison  of  the  estimates  shows  that 
this  is  the  case. 

The  residual  chi  square  (0.3361)  calculated  by  SAS  PROC  PROBIT 
is  the  same  as  the  (weighted)  residual  sum  of  squares  in  the 
PROC  NLIN  fit.  Thus  this  residual  sum  of  squares  provides  a 
test  of  goodness  of  fit  of  the  model. 

The  asymptotic  variances  and  covariances  calculated  by  PROC  NLIN 
need  to  be  adjusted  before  being  compared  to  those  calculated  by 
PROC  PROBIT.  In  particular,  combining  the  asymptotic  standard 
errors  and  the  asymptotic  correlation  matrix  obtained  by  PROC  NLIN 
we  calculate  the  asymptotic  variance  covariance  matrix. 

/  0.066928  -  0.000543  -  0.000618  \ 

-0.000543  0.00000455  0.000004681 

\-0. 000618  0.00000468  0.00002797/ 

This  matrix  looks  nothing  like  the  asymptotic  variance  covariance 
matrix  that  is  calculated  by  SAS  PROC  PROBIT.  The  reason  for 
this  is  as  follows.  We  stated  that  X  has  mean  y^Oj))  =  N^p^O^) 
and  variance  a^(j^)  -  N^p^(^)  (1  -  p^O^J).  However  the  weighted 
least  squares  fit  is  carried  out  assuming  that  Var(X^)  =  kO?(^). 
where  k  is  a  constant  to  be  estimated  from  the  data.  Thus  the  es¬ 
timates  of  the  variances  and  covariances  given  by  PROC  NLIN  as- 

A  /\  r\  A 

sume  Var(X^)  =  ka^(^).  Thus  all  variances  and  covariances  are 
multiplied  by  k. 

How  is  k  estimated?  Just  as  in  the  case  of  weighted  linear 
regression,  k  is  estimated  by  the  residual  mean  square.  Namely, 

A 

k  =  weighted  residual  mean  square  =  0.11203710 

Our  maximum  likelihood  model,  though,  tells  us  that  k  =  1.  We 
thus  need  to  adjust  all  variances  and  covariances  to  this  value 
of  k.  To  do  this,  we  simply  divide  the  above  variance  covariance 
matrix  by  £.  When  this  is  done  we  obtain 


0.5974  -0.00485  -0.00551 

-0.00485  0.0000405  0.0000418 

-0.00551  0.0000418  0.000250  . 


This  matrix  is  nearly  the  same  as  that  calculated  by  PROC  PROBIT. 

An  important  purpose  for  fitting  the  probit  model  is  to  calculate 
lower  confidence  bounds  on  safe  concentrations  by  use  of  Fieller's  theo¬ 
rem  (Finney  [11],  pp  78-79).  That  is,  we  wish  to  calculate  a  lower 
bound  on  the  concentration  such  that  $(8q  +  B^conc)  =  L,  where  L  is  some 
specified  response  rate.  Such  lower  confidence  bounds,  at  (one  sided) 
confidence  level  97.5  percent  are  a  standard  part  of  the  PROC  PROBIT  out¬ 
put.  They  are  given  in  Figure  XIV. 12  for  the  Holcombe  and  Phipps  compound 
D  data  with  untransformed  concentration.  We  indicate  below  how  to  calcu¬ 
late  these  bounds  for  any  confidence  level,  based  on  the  output  from 
PROC  NLIN.  The  theory  underlying  these  calculations  is  sketched  in  App¬ 
endix  A XV. 


The  fitted  model  is  p(conc)  =  c  +  £4>(8q 


+  B^conc) . 


We  wish  to  construct  a  1  -  a  level  confidence  interval  on  that  CONC  such 
that 


<KBq  +  B1concL)  =  L 


where  L  is  specified  (e.g.  0.01,  0.05,  0.10  etc.).  L  represents  the 
response  level  attributed  to  toxicant  (i.e.  over  and  above  background). 

The  point  estimate,  concT ,  is 

Li 


i-l , 


A  A 


concL  =  ($  (L)  -  Bq)/B1  =  (fL  -  B0)/Br 


A  A 


Let  the  asymptotic  variance-covariance  matrix  of  (8q,  B^)  be  denoted  as 


A  A 


g  h\  /  Var(B0)  Cov(Bo,  8^ 


A  A 


h  j  /  \ Cov(Bq,  Bj)  Var(B1)  , 


A  1  -  a  confidence  interval  on  cone  is  shown  in  Appendix  AXV  to  be 

L 


/\ 

cone  £ 

Li 


-B+/B2  -  4AC 
2A 


where 
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B  -  2[6l  (Sfl  -  £l)  -  hza/2] 


C  -  [<S0  -  £l)2  -  g*2/2] 


za^2  is  the  upper  a/2  point  of  the  standard  normal  distribution. 

A  A 

The  quantities  3q,  g,  h,  j  are  obtained  as  output  from  NLIN.  The 
results  of  the  calculations  are  given  below. 
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Results  from  calculation  of  upper  and  lower  95  percent 


confidence  bounds  on  various  percentiles  of  PROBIT  fit  —  by  Fieller's 


Theorem 


fL  B  ♦-1(W 


Lower  95%  conf . limit 


upper  95%  conf. limit 


0.01 

-2.345 

22.958 

69.385 

0.03 

-1.88 

39.00 

78.23 

0.05 

-1.645 

47.07 

82.73 

0.07 

-1.48 

52.73 

85.91 

0.10 

-1.28 

59.96 

89.78 

0.15 

-1.03 

68.06 

94.66 

0.20 

-0.83 

74.80 

98.62 

0.25 

-0.68 

79.82 

101.63 

0.50 

0 

101.69 

116.15 

These  confidence  bounds  are  seen  to  agree  very  closely  with  the 
bounds  calculated  by  SAS  PROC  PROBIT  and  which  appear  in  Figure  XIV. 12. 

The  previous  dose  response  model  fitted  by  use  of  PROC  NLIN  was  a 
repeat  of  a  model  that  has  also  been  fitted  by  PROC  PROBIT.  Comparisons 
of  the  PROC  PROBIT  and  PROC  NLIN  outputs  verified  that  dose  response 
models  can  in  fact  be  fitted  by  nonlinear  regression  programs  and  helped 
to  interpret  the  various  features  of  the  PROC  NLIN  output. 

We  now  consider  the  fitting  two  dose  response  models  that  cannot  be 
fitted  by  PROC  PROBIT.  This  of  course  is  the  reason  for  considering  the 
application  of  PROC  NLIN  for  dose  response  estimation  in  the  first  place. 
We  first  consider  the  logistic  model  and  then  look  at  an  alternative  to 
Abbott's  correction  for  accounting  for  background  response. 

The  logistic  model  is  a  commonly  used  dose  response  model  and  gives 
results  very  similar  to  probit  fits,  at  least  between  the  2nd  and  98th 
percentiles.  The  logistic  c.d.f.  is 


F(x) 


-00<X<00 


and  is  a  symmetric  unimodel  distribution  like  the  normal,  but  has  heavier 
tails.  We  fit  the  dose  response  model 


p(conc)  =  c  +  +  B^conc) 


in  direct  analogy  to  the  probit  fit  that  appears  in  Figures  XIV. 11. 


The  results  of  the  Gauss-Newton  interative  process  are  given  in 
Figure  XV. 6.  The  Marquardt  algorithm  converges  whereas  the  Gauss-Newton 
algorithm  does  not  because  the  Marquardt  algorithm  can  take  smaller  steps 
and  is  more  flexible  in  direction.  However  both  algorithms  arrive  at 
nearly  the  same  parameter  estimates.  The  summary  of  the  fitted  dose 
response  model  appears  in  Figure  XV. 7.  We  can  compare  this  fit  to  the 
probit  fit  in  Figure  XV. 4. 

We  see  that  both  the  logit  model  and  the  probit  model  fit  the  data 
quite  well  (residual  sums  of  squares  are  quite  small) .  The  background 
mortality  rate  is  estimated  to  be  about  0.07  by  each  model.  The  asympt¬ 
otic  variance-covariance  matrix  of  the  logit  fit  parameters  is  estimated 
to  be 


Var  = 


0.093958 


fO. 48086  0 

0  0.00388 

0  0 


'0.48086 
0 
0 


0 

0.00388 

0 


0  \(  1.0000  -0.988351  -0.550366\ 

0  II -0. 988351  1.0000  0.52079lJ- 

0.00518/ \-0. 55036  0.520791  1.0000  ) 

2.46095  -0.019626  -0.01459\ 

-0.019626  0.000160  0.000111) 

-0.01459  0.000111  0.000286/ 


h 

j 

* 


* 

* 

* 


We  now  apply  Fieller’s  procedure  for  calculating  lower  end  upper  con¬ 
fidence  bounds  on  distribution  percentiles  of  the  dose  response  fit.  We 
need  only  modify  the  calculations  done  for  the  probit  fit  by  defining 
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Results  from  calculations  of  upper  and  lower  95  percent 


confidence  bounds  on  various 

percentiles 

of  LOGIT  fit  — 

by  Fieller't 

Theorem 

T 

f  =  On  . 

L  lower  95% 

upper  95% 

LOGIT 

PROBIT  Point 

Li 

L_  —  JOU 

L 

1-L  conf. limit 

conf . limit 

point  estimate 

estimate 

0.01 

-4.5951 

2.515 

66.209 

45.31 

52.931 

0.03 

-3.4761 

26.859 

78.151 

61.205 

64.231 

0.05 

-2.9444 

38.390 

83.863 

68.757 

69.942 

0.07 

-2.5867 

46.127 

87.726 

73.838 

73.952 

0.10 

-2.1972 

54.526 

91.957 

79.370 

78.812 

0.15 

-1.7346 

64.454 

97.031 

85.940 

84.887 

0.20 

-1.3863 

71.879 

100.900 

90.888 

89.747 

0.25 

-1.0986 

77.967 

104.142 

94.974 

93.392 

0.50 

0 

100.441 

117.293 

110.578 

109.916 

The  point  estimates  of  the  probit  and  logit  fit  percentiles  are  pre¬ 
sented  side  by  side  for  comparison.  Except  at  L  =  0.01  they  are  very 
close  and  even  at  L  =  0.01  they  are  similar.  The  situation  is  a  bit 
different  with  respect  to  confidence  bounds  on  the  safe  concentration. 

The  logit  confidence  bounds  are  to  be  compared  with  the  probit  confidence 
bounds.  We  see  that  the  upper  logit  and  probit  confidence  bounds  are 
very  similar  at  each  percentile.  However  the  lower  confidence  bounds 
for  the  logit  fit  are  somewhat  lower  than  the  lower  confidence  bounds 
for  the  probit  fit  at  the  low  distribution  percentiles. 

For  L£0.07,  the  lower  confidence  bounds  for  the  logit  and  probit 
fits  are  rather  similar,  the  lower  logit  bounds  being  constantly  small¬ 
er  than  the  lower  probit  bounds.  For  L  below  0.05  this  phenomenon  is 
accentuated,  especially  at  L  *  0.01.  This  the  region  in  which  mortali¬ 
ty  due  to  background  is  the  first  order  effect  while  toxicant  related 
mortality  is  secondary.  Thus  the  data  and  the  fitted  model  reflect  pri¬ 
marily  the  background  effects  and  provide  little  direct  evidence  about 
toxicant  related  mortality.  Since  the  tails  of  the  logistic  distribu¬ 
tion  are  heavier  and  steeper  than  the  tails  of  the  normal  distribution, 
changes  in  parameter  values  perturb  percentile  estimates  in  the  normal 
distribution  much  less  than  they  do  in  the  logistic  distribution.  Thus 


the  lower  logistic  confidence  limits  become  much  wider  than  the  correspo¬ 
nding  lower  normal  limits  as  L  0.  This  phenomenon  holds  very  strongly 
in  this  example  at  L  =  0.01  and  to  some  extent  at  L  =  0.03,  0.05.  In 
this  region  the  data  provide  little  basis  to  choose  between  the  logit  and 
probit  fits.  Both  models  fit  the  data  well  and  yield  very  similar  point 
estimates.  Thus  we  learn  the  following  lesson: 

THE  LOWER  CONFIDENCE  BOUNDS  ON  "SAFE"  CONCENTRATIONS  CORRESPONDING  TO 
LOW  PERCENTILES  OF  THE  DOSE  RESPONSE  DISTRIBUTION  MAY  BE  SENSITIVE  TO 
THE  PARTICULAR  FORM  ASSUMED  FOR  THE  DOSE  RESPONSE  RELATION,  EVEN  THOUGH 
SEVERAL  MODELS  MAY  FIT  THE  DATA  EQUALLY  WELL  AND  PROVIDE  SIMILAR  POINT 
ESTIMATES  OF  PERCENTILES.  THE  DATA  MAY  NOT  BE  SUFFICIENT  TO  DISTINGUISH 
BETWEEN  THE  MODELS. 

This  phenomenon  is  observed  quite  frequently  in  very  low  dose  extra¬ 
polation  based  on  results  of  carcinogenesis  experiments.  However  in 
those  applications  the  extrapolation  is  much  more  extreme  than  in  fish 
toxicology  applications.  However  this  example  illustrates  that  even  in 
fish  toxicology  situations  the  inference  about  safe  dose  can  be  very 
sensitive  to  model  assumptions,  even  at  the  first  to  the  third  per¬ 
centile.  The  extent  of  background  effects  may  prevent  us  from  disti¬ 
nguishing  among  alternative  models  which  fit  the  data  about  equally  but 
which  yield  qualitatively  different  inferences  about  safe  concentrations 
corresponding  to  low  distribution  percentiles. 

To  partially  circumvent  this  problem  we  consider  an  alternative 
approach  to  dose  response  estimation  based  on  fewer  assumptions  about 
the  shape  of  the  response  distribution.  See  the  following  section  for  a 
discussion  on  this  nonparametric  approach  to  dose  response  estimation. 

We  consider  now  a  third  example  of  fitting  dose  response  models  by 
means  of  nonlinear  regression.  This  example  involves  a  nonstandard  model 
which  provides  an  alternative  to  Abbott's  correction  to  account  for  back¬ 
ground  response.  Abbott's  correction  is  appropriate  when  the  mechanism 
associated  with  background  effects  is  independent  of  the  mechanism  asso¬ 
ciated  with  toxicant  effects.  For  example  toxicant  mortality  may  be  due 
to  chemical  effects  whereas  background  mortality  may  be  due  to  increased 
handling  of  the  fish. 

1  However  Stephan  [  42  ]  criticizes  the  assumption  that  the  control 
mortality  mechanism  is  totally  independent  of  the  toxicant  mortality 
mechanism.  He  states  that  stressing  the  fish  during  the  acclimation  or 
testing  periods  may  make  them  more  susceptible  to  the  toxicant.  Thus 
background  effects  may  act  like  additions  to  the  toxicant  concentrations. 
Stephan  suggests  not  correcting  for  control  mortality  when  assessing 
the  effects  of  various  toxicant  concentrations. 

An  alternative  way  to  reflect  the  dependence  between  background  and 
toxicant  mortality  mechanisms  is  to  fit  a  model  which  reflects  the  fact 
that  background  may  function  as  an  addition  to  the  effective  toxicant 
concentration.  Assume  that  background  effects  are  equivalent  to  an 


addition  of  c  yg/liter  in  toxicant  concentration.  The  quantity  c  is  a 
model  parameter  to  be  estimated  from  the  data.  (Note  that  the  usage  of 
the  notation  c  is  completely  different  in  this  example  than  in  the  pre¬ 
vious  examples  in  this  section.  Here  it  is  being  used  as  a  concentration 
whereas  in  previous  examples  the  symbol  c  represented  a  probability. 

Consider  the  following  models  based  on  an  assumed  normal  dose 
response  curve. 

(1)  p(conc)  =  $[$q  +  (3^  (cone  +  c)  ] 

(2)  p(conc)  =  $[gQ  +  g1(logl0(conc  +  c)  -  3.0)]. 

These  models  are  to  be  fitted  to  the  data  by  maximum  likelihood  esti¬ 
mation,  based  on  binomial  distribution  theory.  The  parameters  3q,  3^» 
c  are  to  be  estimated  from  the  data.  The  first  model  is  over  parameter¬ 
ized,  in  that  3q  and  c  cannot  be  separated  from  one  another.  Thus  to 
fit  model  (1)  we  fit 


p(conc)  =  $[(3q  +  31c)  +  3^conc]  =  $(otg  +  3-^conc) 

using  PROC  PROBIT  with  untransformed  concentration  and  no  "background" 
effect  included. 


The  centering  cgnstant  3.0  in  model  (2)  is  intended  to  reduce  the 
correlation  between  3o»  thereby  improving  the  covergence  properties 
of  the  fitting  algorithms.  To  fit  model  (2)  we  carry  out  a  maximum  like¬ 
lihood  analysis  using  PROC  NLIN.  The  output  from  this  analysis  is  shown 
in  Figures  XV. 8,  XV. 9.  The  Marquardt  algorithm  again  achieves  convergence 
whereas  the  Gauss-Newton  algorithm  does  not.  Note  however  that  the  Gauss- 
Newton  algorithm  attains  a  smaller  residual  sum  of  squares  due  to  the 
difference  in  weighting.  (The  distinction  between  attaining  the  smallest 
residual  sum  of  squares  and  attaining  a  stationary  point  corresponds  to 
the  difference  between  minimum  chi  square  estimation  and  maximum  like¬ 
lihood  estimation.  This  distinction  is  discussed  in  Jennrich  and  Moore 
[43],  page  10,  and  in  the  BMDP  manual  [27].  Both  of  these  methods  are 
asymptotically  equivalent.  The  summary  of  the  Marquardt  algorithm  fit 
is  presented  in  Figure  XV. 9.  The  residual  sum  of  squares  represents  a 
chi  square  test  for  goodness  of  fit  of  the  model.  We  see  that 

residual  chi  square  =  59.32  with  3d.f. 

Thus  the  model  does  not  seem  to  fit  the  data.  We  break  down  this  re¬ 
sidual  chi  square  into  individual  cell  components  to  determine  whether 


the  large  residual  chi  square  represents  consistent  lack  of  fit  or  the 
contribution  of  a  single  aberrant  cell. 

We  see  that  the  expected  frequencies,  Np,  N§,  are  quite  large  under  the 
model  fit  and  that  there  is  a  systematic  discrepancy  between  model  and 
data.  Namely  the  model  underestimates  at  the  lower  and  upper  ends  and 
overestimates  p  in  the  middle. 

We  thus  conclude  that  model  (2)  is  not  appropriate  for  this  set  of 
data.  However  this  or  similar  models  may  be  appropriate  for  other  sets 
of  data.  The  point  is  that  the  use  of  nonlinear  regression  techniques 
to  fit  dose  response  curves  greatly  expands  the  variety  of  models  that 
we  can  fit  to  the  data. 

Since  model  (2)  does  not  fit  the  data  well  we  do  not  use  it  to  calcu¬ 
late  lower  confidence  bounds  on  "safe"  dose.  However  these  calculations 
can  easily  be  made  by  use  of  asymptotic  maximum  likelihood  theory. 


189 


Figure  XV. 1  Holcombe  and  Phipps  compound  D  mortality  and  concentration  data  —  pooled 
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Figure  XV. 2  Output  from  SAS  proc  NLIN  applied  to  Holcombe  and  Phipps  data  —  untransformed 
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Figure  XV. 3  Output  from  SAS  PROC  NLIN  applied  to  Holcombe  and  Phipps  data  —  untransformed 


4  o 
a 
a 


ac  4 

4 


orr-csjc- 

^•4-fV  | 

-JD.pmc'mj'  • 
O»‘0  *» 
h>  ^f'-co 

Of  OWO  > 

*r uj  '^or 
O'  (—  *  •  »  j 

Z  r  10  o 


2  h 

a 

pm 

n  **• 

< 

lO 

tf\© 

H*  UJ  * 

2*  «. 

Uj 

3! 

4-*-* 

.  30  [ 

IT. 

3 

a,h- 

HZ 

U-  *— 

O 

lu 

_/  2. 

«*.' 

r  o 

.  to  ■  ' 

j 

— t 

»- 

O  CM 

>-*-«  !  } 

uO  _> 

CT 

X 

v>a, 

2  7* 

< 

o 

u%*-» 

1  «*Z 

*-• 

•  • 

r  oocfsicvifNi  j 

.  .  o 

or 

i**© 

1 

3 

< 

v/' 

; 

O 

> 

ITV 

pr 

>  . 

r- 

m  O'  i 

3  3l 

K 

< 

1 

ft* 

z 

«  a 

*-JL 

3 

pM 

c\i 

i  ^OC  ; 

«  V| 

oz 

2 

Uj 

ir  oc  ! 

i 

OQ 

CO 


tv*- 

■'JH 
Z  31 

oo 
a  oc 
m  a. 


vi 

O 

o 


«/) 

oc 


VJ 

-J 

i 

or 

< 

-J 

!  *M 

< 

*— 

< 

h- 

3 

:a 

p— 

•< 

cj 

O 

.  2 

Si 

»— 

M 

i 

o 

k— 

2  iw 

a 

v> 

</> 

O  4- 

UJ 

UJ 

< 

i 

NJO 

H 

1 

a.* 

«o<ai 

O 

— J 

Ul 

vOac 

UJ 

o 

UJ3QT 

a: 

1 

or 

cc 

3H3 

QC  I 

3 

OvlO 

o 

o 

O 

tuaiz 

O  i 

2 

o 

0C0C3 

«•*  1 

j 

-1 

| 

; 

, 

i 

2 

!  ' 

r 

I 

a 

t 

t 

s 

at 

2 

1 

i 

1 

Ul 

©-< 

CD  30) 


3&is 


<csu 


193 


Figure  XV  4.  Output  from  SAS  PROG  NLIN  applied  to  Holcombe  and  Phipps  data  -  untransformed 
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Figure  XV. 8  Use  of  SAS  PROC  NLIN  to  fit  nonstandard  dose  response  model  to  data 
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Figure  XV. 9  Output  from  SAS  PROC  NLIN  to  fit  nonstandard  dose  response  model  to  data 


XVI.  NONPARAMETRIC  LOWER  CONFIDENCE  BOUNDS  ON  SAFE  CONCENTRATIONS 


In  this  section  we  again  consider  the  estimation  of  "safe"  con¬ 
centrations  based  on  fitted  dose  response  curves.  We  wish  to  estimate 
a  lower  confidence  bound  on  that  concentration,  c  ,  for  which  the  respo- 

Li 

nse  rate  is  no  more  than  L  greater  than  the  control  group  rate.  The 
value  of  L  is  specified  by  the  user.  We  present  this  situation  picto- 
rially  below. 

Response  Rate 


CONC. 


The  dose  response  curve  is  assumed  to  be  concave  upward  at  the  lower 
percentiles.  For  a  logit  or  probit  fit  this  would  be  below  the  median 
of  the  distribution.  The  solid  portion  of  the  illustration  represents 
the  region  where  the  dose  response  curve  is  concave  upward. 

The  upper  bound  on  the  upward  concavity  region  is  denoted  UCR. 

UCR  is  specified  by  the  user. 

Let  c.  £  c,  <  c_  <  . . .  <  c  <  UCR  denote  the  test  concentrations  (treat- 
0  12  r 

ment  and  control)  in  the  upward  concavity  region.  Cq,  the  control  group 
concentration,  would  often  be  0. 

The  standard  method  of  estimating  c^  by  means  of  dose  response 
curves  is  to  assume  a  specific  form  for  the  dose  response  curve  such  as 
probit  or  logit  in  concentration  or  in  log  concentration  and  then  fit  the 
model  by  means  of  maximum  likelihood  estimation,  based  on  all  the  data. 
SAS  PROC  PROBIT  or  a  nonlinear  regression  package  can  be  used  to  fit 
such  models. 

The  procedure  discussed  in  this  section  has  a  number  of  import¬ 
ant  differences  from  these  standard  parametric  dose  response  models. 

Among  these  are 

1.  Inferences  about  safe  concentrations  can  sometimes  be  rather 
sensitive  to  the  particular  form  of  the  dose  response  curve 


assumed.  Yet  it  may  not  be  possible  to  distinguish  among 
such  competing  models  based  on  the  data  at  hand.  The  need 
for  such  strong  parametric  assumptions  is  alleviated  with 
the  procedure  in  this  section. 

2.  Once  a  functional  form  is  chosen  for  the  dose  response  curve, 
there  is  still  the  question  of  the  dose  metameter.  Different 
lower  bounds  may  result  depending  on  whether  the  probit  (say) 
model  is  chosen  with  respect  to  concentration,  log  (concent¬ 
ration),  or  some  other  function  of  concentration.  There  is 
no  need  to  worry  about  the  specific  form  of  the  dose  meta¬ 
meter  with  the  nonparametric  procedure  of  this  section. 

3.  The  parametric  dose  response  models  assume  a  specific  func¬ 
tional  form  for  the  correction  for  background  responses; 
Abbott's  formula  (Finney,  [  11  ] ,  ppl25-126)  is  commonly  used. 
However  estimates  of  the  low  percentiles  of  the  dose  response 
curve  can  be  sensitive  to  the  specific  form  of  background 
correction  used.  The  procedure  in  this  section  does  not 
require  the  specification  of  any  particular  functional 

form  for  background  response. 

4.  The  standard  method  of  fitting  a  parametric  dose  response 
model  is  by  means  of  maximum  likelihood  estimation.  The 
theoretical  justification  is  based  on  the  assumptions  of 
large  samples  and  asymptotic  normality.  These  assumptions 
may  not  be  entirely  satisfied  in  the  case  of  relatively 
small  sample  sizes  or  of  many  response  group  probabilities 
at  or  near  0  percent  or  100  percent.  By  contrast,  the 
method  discussed  in  this  section  is  based  on  exact  small 
sample  theory  and  so  is  appropriate  irrespective  of  small 
sample  sizes  or  extreme  response  rates.  We  also  present 

an  alternative  confidence  bound  calculation  which  may  yield 
higher  lower  bounds,  however  this  alternative  approach  depe¬ 
nds  on  large  sample  theory  and  asymptotic  normality.  Both 
estimates  are  routinely  calculated  by  our  computer  program. 

5.  The  standard  parametric  probit  or  logit  dose  response  curve 
fits  utilize  the  information  from  all  the  test  concentrations, 
including  those  high  concentrations  at  the  upper  end  of  the 
dose  response  curve,  far  away  from  the  safe  concentration. 

In  fact,  these  upper  concentrations,  with  high  response  rates 
are  very  instrumental  in  determination  of  the  slope  estimate 
and  associated  precision  estimate.  These  high  concentrations, 
thus  carry  considerable  weight,  through  the  specification  of 
the  model,  in  estimating  response  behavior  at  the  low  concen¬ 
trations.  This  is  not  desirable,  since  the  same  functional 
form  may  not  be  appropriate  throughout  the  entire  range  of 
concentrations.  By  contrast,  the  method  in  this  section 
uses  information  only  from  those  concentration  groups  where 
the  dose  response  curve  is  concave  upward.  This  is  generally 
in  the  region  below  the  median  of  the  dose  response  curve. 
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One  assumption  made  throughout  this  section  is  that  the  response 
results  can  be  modelled  with  the  binomial  distribution  within  each  tank 
and  that  there  is  no  evidence  of  tank  to  tank  heterogeneity  within  treat¬ 
ment  groups.  The  responses  can  then  be  pooled  across  tanks  within  treat¬ 
ment  groups  and  we  can  assume  a  single  binomial  distribution  for  the 
pooled  responses  within  each  treatment  group.  This  distributional  as¬ 
sumption  is  made  in  our  program. 

What  do  we  do  if  there  in  fact  is  evidence  of  heterogeneity  with¬ 
in  tanks?  There  are  three  approaches  to  account  for  this  situation.  See 
Section  IX  for  detailed  discussion.  Briefly  these  approaches  are: 

1.  Carry  out  analyses  on  a  per  tank  basis  rather  than  on  a  per 
fish  basis.  This  is  the  approach  that  is  currently  being 
used  by  some  researchers.  However  this  approach,  greatly 
reduces  the  number  of  degrees  of  freedom  available  for 
analysis.  I  feel  it  is  too  conservative. 

2.  Fit  distributional  models  to  the  data  that  explicitely  ac¬ 
count  for  such  tank  to  tank  heterogeneity.  Several  such 
models  proposed  are  the  beta  binomial  model  (Williams  [21]) 
or  the  correlated  binomial  model  (Kupper  and  Haseman  [22  ]). 
These  models  generalize  the  binomial  distribution  model  and 
can  be  incorporated  into  a  dose  response  curve  estimation 
model.  The  fitting  would  be  by  maximum  likelihood  estima¬ 
tion  and  the  statistical  inferences  would  be  based  on  asy¬ 
mptotic  normal  distribution  theory. 

3.  We  can  adjust  the  data  to  reflect  the  within  tank  correlation. 
Namely  tank  to  tank  heterogeneity  reflects  itself  as  varia¬ 
tion  in  response  rate  from  tank  to  tank  within  treatment 
groups.  This  can  also  be  regarded  as  correlation  of  respon¬ 
ses  within  individual  tanks.  The  effect  of  such  correlation 
is  to  reduce  precisions  of  estimates  as  compared  to  what  they 
would  be  in  a  binomial  model,  since  the  correlations  will 
usually  be  positive.  This  reduced  precision  can  be  account¬ 
ed  for  in  a  workmanlike  manner  by  reducing  the  effective 
sample  size  within  each  tank.  Namely  suppose  we  have  4  tanks 
per  group,  25  fry  per  tank,  and  responses  1,  3,  8,  7  respecti¬ 
vely.  The  effect  of  assuming  a  binomial  model  would  be  to 
pool  data  across  tanks  within  groups ,  so  tfrat  we  have  100 

fry  and  19  responses.  Thus  £  =  .19  and  \VarCp)  = 

^(. 19) (1  -  .19)/100  =  .039.  However  correlation  within  tanks 
inflates  the  variability  by  a  factor  h.  (h>l) .  Reduce  the 
assumed  sample  size  within  each  tank  from  25  to  25/h.  Cor¬ 
respondingly  reduce  the  effective  number  of  responses  within 
each  tank  to  1/h,  3/h,  8/h,  7/h,  for  a  totgl  of  19/h.  Thus 
p  -  (19/h)/(100/h)  =  .19  still.  However  VVar(p) 

^(.19) (1  -  .19)/(100/h)  *  ,039Vh.  We  then  disregard  the 
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tank  to  tank  heterogeneity  and  utilize  the  binomial  based 
procedures,  such  as  the  computer  program  discussed  in  this 
section. 


This  method  of  adjustment  to  effective  sample  sizes  is 
approximate  and  somewhat  crude,  however  it  has  the  advant¬ 
age  of  simplicity  and  no  special  computer  programs  need  be 
used.  Namely,  the  same  methods  that  are  utilized  in  the 
absence  of  tank  to  tank  heterogeneity  are  used  in  the  pre¬ 
sence  of  such  heterogeneity,  only  with  reduced  sample  sizes. 
This  allows  for  the  use  of  standard  analysis  tools  in  non¬ 
standard  situations. 

Looked  at  from  the  perspective  of  reducing  sample  sizes 
to  an  effective  sample  size,  carrying  out  analyses  on  a  per 
tank  basis  is  like  reducing  the  effective  sample  size  in  a 
tank  all  the  way  down  to  1.  I  feel  that  this  is  going  a  bit 
too  far. 

In  particular  the  methods  discussed  in  this  section  can  be 
utilized  following  such  adjustments  to  account  for  tank  to 
tank  heterogeneity.  Thus  from  now  on  in  this  section  we 
ignore  the  question  of  tank  to  tank  heterogeneity  within 
groups  and  discuss  our  procedure,  based  on  binomial  dis¬ 
tribution  theory,  as  if  there  were  no  tank  to  tank  hetero¬ 
geneity. 

We  now  consider  the  details  of  the  nonparametric  dose  response 
procedure . 

Assume  that  k  is  such  that  c^  <  c^  <  <  UCR-  The  value  of  k  is  speci¬ 

fied  by  the  user  of  the  program. 


Response  Rate 


CONC 
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Let  pQ,  denote  the  true  response  rates  at  Cq,  c^  respectively.  Draw 
a  chord  joining  the  points  (c  ,p  ),  (c,  ,  p.) .  Let  8  denote  the  slope  of 
this  chord.  Thus  C  ° 


Upward  concavity  implies  that  the  chord  lies  above  the  dose  response 

curve  throughout  the  region  (cq,c^).  The  concentration  at  which  this 

chord  crosses  the  value  c  +  L  on  the  response  scale  is  c  +  L/B.  If 

c  =0,  this  concentration  is  L/8.  Thus 
o 

c  +  L/8  <  c. 
o  -  L 

and  so  it  we  can  place  a  lower  confidence  bound  on  c0  +  L/8,  then  this 
also  serves  as  a  nonpar ametric  lower  confidence  bound  on 

Let  p  ,  . be  the  estimated  response  rates  based  on  bi- 

nomia?  theory  at  concentrations  c  ,  c^ . c^  respectively.  Then 

'C  B  (N^,p^)  i  -  0,  1,  ...»  r  when  is  the  number  of  animals  on 

test  in  the  i-th  treatment  group  pooled  across  tanks  and  p  is  the  true 
response  probability  within  the  i-th  treatment  group. 

'V 

Let  pD  denote  a  lower  confidence  bound  on  pQ,  pk  denote  an  upper 

confidence  bound  on  p^.  Such  exact  lower  and  upper  confidence  bounds 
were  derived  by  Clopper  and  Pearson  [  46  ]  and  are  valid  for  small  sample 
sizes.  Expressions  for  them  are  contained  in  a  number  of  sources,  in¬ 
cluding  Hollander  and  Wolfe  [47],  pages  23,  24.  Charts  for  these  con¬ 
fidence  intervals  are  given  in  a  number  of  places,  including  Dixon  and 
Massey  [13]  pp  501-504.  Expressions  for  these  confidence  bounds  are 
given  in  Appendix  AXVI.l. 

An  upper  confidence  bound,  8^,  on  8  is 


Thus  a  lower  confidence  bound  on  c  is  c  +  L/By.  This  confi¬ 
dence  bound  is  valid  in  small  samples.  ° 

The  results  in  the  concentration  groups  +  ck  +  2’  • • • »  cr 
can  be  used  to  improve  on  the  confidence  bound  discussed  above.  The  de¬ 
tails  of  this  procedure,  along  with  a  description  of  an  alternative  con¬ 
fidence  bound,  valiC  in  large  samples,  are  discussed  in  the  writeup  "A 
Computer  Program  to  Calculate  Nonparametric  Lower  Confidence  Bounds  on 


it  .  v-  ’•y  •  '  .  - 


Safe  Concentrations  in  Quantal  Response  Toxicity  Tests"  by  Feder  and 
Sherrill  [  41  ] .  This  writeup  also  describes  in  detail  the  use  of  a 
computer  program  to  implement  this  procedure.  This  document  is  included 
as  an  appendix  to  this  section.  We  illustrate  the  results  of  this  pro¬ 
gram  by  example  in  the  remainder  of  this  section  and  compare  the  results 
of  the  nonparametric  estimates  of  safe  concentration  with  those  based  on 
the  logit  or  probit  fits. 

We  first  consider  the  DeFoe  compound  C  fry  mortality  data. 


We  have  seen  from  previous  sections  that  there  is  no  evidence  of  tank  to 
tank  heterogeneity  within  treatment  groups. 

The  various  portions  of  the  computer  program  output  are  numbered 
and  we  discuss  them  in  detail. 

As  a  number  of  the  parameter  values  used  in  the  program  were 
chosen  rather  arbitrarily  (e.g.  UCR)  we  should  regard  the  output  as  il¬ 
lustrative  of  the  algorithm's  working  rather  than  as  a  definitive  ans¬ 
wer  in  this  particular  case.  We  know  that  the  algorithm  will  give  con¬ 
servative  answers.  The  question  is  just  how  conservative  the  algorithm 
is. 


We  know  from  the  preliminary  plots  and  tests  of  homogeneity  that 
there  is  no  concentration  related  trend  in  embryo  mortality.  Such  pre¬ 
liminary  analyses  are  very  important  to  carry  out,  in  order  to  gain  an 
understanding  of  the  structure  of  the  data.  This  helps  us  to  interpret 
the  results  of  the  procedures  such  as  the  one  in  this  section. 

The  numbered  descriptions  below  refer  to  the  similarly  numbered 
descriptions  in  the  computer  printout  for  the  DeFoe  fry  mortality  data. 

1.  The  title  of  the  output.  This  title  appears  at  the  head  of 
every  page. 

2.  The  basic  data  are  presented  for  each  tank  within  each  con¬ 
centration  group  (treatment  and  control) .  Numbers  of  fry 
per  tank,  numbers  survived,  and  toxicant  concentration  are 
given. 

3.  The  number  and  the  proportion  of  dead  fry  within  each  group 
are  given.  These  values  are  calculated  by  pooling  across 
tanks  within  groups . 

4.  Basic  parameter  values  for  the  procedure. 

L  =  response  rate,  over  and  above  the  control  rate,  at  the 
"safe"  concentration. 
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k  =  the  index  of  assumed  upper  bound  on  the  "safe"  con¬ 
centration,  c^. 

i.e.  c  <  c.  <  c,  . 
o  -  L  -  k 

UCR  =  upper  limit  of  the  concave  upward  region  in  the  dose 
response  curve. 

A  number  of  confidence  bound  calculations  are  carried  out  for 
differing  combinations  of  (L,  k$  UCR).  In  this  example  UCR  is  specified 
as  50.  This  places  it  just  above  the  fifth  treatment  group,  (p  =  .225 

P6  =  1.000).  13 

Thus  r  =  5  and  c^  =  48.3074  in  this  problem. 

L  and  k  are  varied. 

L  =  .01,  .05,  .10 

k  =  3,  4,  5. 

L  =  .01,  k  =  3,  UCR  =  50.0  in  the  first  calculation. 

5.  Simultaneous  confidence  interval  adjustments  are  made  in  this 
run  by  means  of  Bonferroni's  inequality  with  familywise  confi¬ 
dence  level  0.95.  Thus  all  small  sample  confidence  intervals 
are  calculated  at  individual  confidence  level  1  -  (.05/4)  = 
0.9875. 

6.  Upper  and  lower  confidence  intervals  are  calculated  at  each 
concentration  group.  These  are  exact,  small  sample  confi¬ 
dence  intervals,  calculated  as  discussed  by  Clopper  and 
Pearson  using  the  expressions  in  Appendix  AXVI.l. 

7.  Straight  line  approximations  to  the  dose  response  curve  are 
calculated  using  the  combinations  of  treatment  groups  shown. 

The  specific  method  of  calculation  of  the  slopes  is  discus¬ 
sed  in  the  program  documentation  in  Appendix  AXVI.2.  (Feder  and 
Sherrill  [  41  ] ) .  For  each  combination  of  concentrations 

CONC  MEAN  is  the  arithmetic  average  of  the  concentrations , 
slope  (normal  approx)  and  slope  (small  sample)  are  the  cal¬ 
culated  values  of  8y  based  on  either  asymptotic  theory  or 
exact  small  sample  theory.  See  the  program  documentation 
for  details. 

8.  Lower  confidence  bounds  on  c^  are  calculated  using  the  mini¬ 
mum  of  the  slopes  in  paragraph  7  (in  this  case  .0056  for 
the  normal  approximation  and  .0084  for  the  exact  approach). 

The  values  given  under  "calculated  safe  dose"  are  ci  +  L/B^. 
These  are  .0494  +  .01/. 0056  =  1.845  and  .0494  +  .01/. 0084  3 
1.24  respectively  for  the  normal  theory  and  small  sample 
calculation. 


Since  we  have  taken  k  -  3,  c,  =  c,  =  5.9762  is  an  upper  bound 
on  c^,  by  assumption.  Thus  J 

cL (normal)  =  min(1.85,  5.98)  =  1.85 
cL(small  sample)  =  min(1.24,  5.98)  =  1.24 

Since  the  response  rates  are  so  extreme  in  5  of  the  6  groups  (i.e.  close 
to  0  or  1)  we  have  small  expected  frequencies  in  many  of  the  cells  and 
asymptotic  normal  theory  is  suspect  here.  We  will  thus  confine  attent¬ 
ion  to  calculations  based  on  exact  small  sample  theory  for  the  remainder 
of  the  section. 

9.  We  modify  the  parameters  defining  the  procedure.  The  values 

k,  UCR  remain  at  3,  50.0  respectively,  however  L  is  changed 
to  0.05.  We  thus  define  the  "safe"  concentration  as  that 
which  yields  a  response  rate  of  .05  above  control. 

10.  Proceeding  through  the  same  calculations  as  before  we  find 
that  the  minimum  slope  (small  sample)  is  .0084.  Thus 

c±  +  L/8u  =  .0494  +  .05/. 0084  =  5.98 

Since  cfc  =  c^  =  5.9762  we  estimate 

cL  =  min  (c  +  L/0D,  cfc)  =  min(5.98,  5.98)  =  5.98 

11.  We  now  alter  L  to  0.10,  leaving  k  and  UCR  as  before. 

12.  cL  =  min(Cl  +  L/0  ,  cfc)  =  min(.0494  +  .10/. 0084,  5.9762)  = 
min(ll. 954,  5.9762)  =  5.9762. 

Thus  c ^  is  constrained  by  overly  conservative 

assumption  about  c,  . 

k 

13.  We  now  set  k  =  4  (c^  =  14.8125)  and  set  L  back  to  0.01 

14.  We  now  calculate  slopes,  but  we  have  fewer  to  work  with. 
Namely  we  use  c^,  c^,  c^  in  various  combinations. 

15.  cL  =  min(C;L  +  L/8^,  cfc)  =  min(.0494  +  .01/. 0082,  14.8125)  = 

l. 2625 


16.  We  now  change  L  to  0.05,  leaving  the  other  parameters  as 
before. 

17.  cL  =  min(c  +  L/Sy,  ck>  =  min(.0494  +  .05/. 0082,  14.8125) 
min(6.1145,  14.8125)  =  6.1145 

18.  We  change  L  to  0.10  leaving  k,  UCR  unchanged. 

19.  c  =  min(c  +  L/0  ,  14.8125)  =  min(.0494  +  .10/. 0082, 

Li  L  U 

14.8125)  =  min(12.1796,  14.8125)  =  12.1796 

20.  We  now  change  k  to  5  and  set  L  back  to  0.01. 

Thus  c,  =  cc  =  48.3074. 
k  5 

21.  L  =  0.01,  k  =  5 

cL  =  min(C;L  +  L/By.Cj^)  =  min(.0494  +  .01/. 0080,  48.3074)  = 
1.3045 

22.  Change  L  to  0.05 

23.  L  =  0.05. 

c^  =  min(c^  +  L/8y,ck)  =  min(.0494  +  .05/. 0080,  48.3074)  = 
6.3246 

25.  L  -  0.10,  k  =  5 

c^  =  min(c^  +  L/0y,  c^)  =  min(.0494  +  .10/. 0080,  48.3074)  = 
12.5998 

We  thus  conclude  that  k  =  3  is  too  conservative.  Setting 

4  or  5  yields  nearly  the  same  lower  bound  on  c  .  In  particular  for 

5  L 

L  =  .01  c  =  1.3045 

L  =  .05  c^  =  6.3246 

L  =  .10  c“  =  12.5998 


v«r*v 


We  now  compare  the  lower  confidence  bounds  on  c^  obtained  from 
the  nonparametric  dose  response  curve  fits  to  these  obtained  by  more 
classical  probit  fits  in  Section  XIV. 

(k  =  5,  UCR  =  50.0) 

Nonparametric  Small  Sample  Asymptotic 

L  =  0.01  cL  =  1.3045  1.496 

L  =  0.05  cL  =  6.3246  7.282 

L  =  0.10  cT  =  12.5998  14.515 

We  fitted  a  probit  model  to  the  data  (untransformed  concentration) 
using  SAS  PROC  PROBIT.  We  corrected  for  background  with  Abbott's  correc¬ 
tion  and  obtained 

Point  estimate  Lower  95%  Conf.Bnd. 

L  =  0.01  3.7725  -157.377 

L  =  0.05  22.6065  -  84.3307 

L  =  0.10  32.6468  -  45.9667 


The  lower  bounds  based  on  the  probit  fit  with  background  are 
thus  useless.  We  refitted  the  model  with  the  assumption  of  no  back¬ 
ground  response.  This  is  therefore  a  more  restrictive  model.  Results: 


Point  estimate  Lower  95%  Conf.Bnd. 

L  =  0.01  1.6567  -17.6457 

L  =  0.05  21.1299  8.8997 

L  =  0.10  31.5110  21.2620 

Thus  at  L  =  .05  and  especially  at  L  =  0.10  the  nonparametric 
bounds  are  more  conservative.  However  they  are  based  on  many  fewer 
assumptions. 

An  attempt  was  made  to  fit  the  probit  model  to  log  concentration, 
as  suggested  by  Finney.  The  probit  program  would  not  converge  at  all. 

We  now  apply  the  nonparametric  dose  response  program  to  the 
Holcombe  and  Phipps  compound  D  fry  mortality  data  and  compare  the  esti¬ 
mates  of  safe  concentration  with  those  based  on  probit  and  on  logit 
fits.  The  logic  underlying  the  procedure  is  indicated  in  Figure  XVI. 1 

Refer  to  the  computer  printout  (nonparametric) .  In  this  example  the 
control  group  is  at  concentration  0  so  we  do  not  have  to  adjust  for  its 
affects.  However  we  do  have  significant  background  effect. 

We  see  from  the  listing  of  the  data  (  (l)  on  computer  printout 
following)  and  mortality  rates  by  group  that  group  5  has  an  observed  fry 
mortality  rate  of  0.79  while  group  4  has  an  observed  fry  mortality  rate 


of  0.13.  Confidence  intervals  on  these  values  of  p  clearly  confirm  that 
group  4  is  below  the  median  of  the  dose  response  curve  while  group  5  is 
above  the  median.  Therefore  UCR  lies  somewhere  between  group  4  and  group 
5.  We  have  taken  it  at  concentration  lOOyg/liter,  about  midway  between 
the  two  groups.  Thus  c^  =  c^  =  72.9499. 

If  we  define  the  "safe"  values  of  L  to  be  below  0.10  (over  and 
above  control)  we  should  try  c  =  c  =  44.9049  or  c^  =  c^  =  72.9499  as 
upper  bounds  for  c^.  We  consider  tne  results  of  the  small  sample  calcu¬ 
lations. 

First  trying  k  =  3: 

L  =  0.01 

c.  =  min(L/g  c.  )  =  min(0. 01/0. 0027,  44.9049)  = 

L  UK 

3.663 

L  =  0.05 

cT  =  min(l/E  c.  )  -  min (0.05/0. 0027,  44.9049)  = 

Li  UK. 

18.3315 

L  =  0.10 

cT  «  min(L/8  c,  )  =  min(0. 10/. 0027 ,  44.9049)  = 

L  UK 

36.6629 

If  we  next  try  k  =  4,  we  have  less  of  an  adjustment  for  simultaneity  and 
so  we  get  slightly  shorter  intervals  in  this  case.  Namely 

k  =  4  : 

L  =  0.01  c  =  3.8453 

L  =  0.05  cl4  =  19.2264 

L  =  0.10  c£  =  38.4529 

Thus  k  =  3  and  k  =  4  yield  essentially  the  same  results  for  all  pra¬ 
ctical  purposes. 

Let's' now  compare  these  results  with  those  obtained  by  fitting 
probit  and  logit  models  to  the  data. 

Probit  models  were  fitted  to  the  data  vs  CONC  (un transformed) 
and  log10  (CONC).  Both  probit  fits  have  nonsignificant  residual  chi 
square  of  about  the  same  magnitude  and  so  are  judged  to  fit  the  data 
about  equally  well.  The  following  95  percent  lower  confidence  bounds 
on  safe  concentration  were  obtained  from  these  fits. 
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PROBIT  FITS 


UNTNSFMD  CONC 


22.403 

46.147 

58.715 


LOG 10  CONC 


45.764 

57.470 

64.836 


We  see  that  the  nonparametric  fit  is  much  more  conservative  than  the 
probit  fits  since  it  is  baseu  on  low  dose  linearity.  Note  that  especia 
lly  at  the  low  percentiles  4 ne  probit  inference  is  quite  sensitive  to 
whether  untransformed  concentration  or  log  concentration  are  used. 

We  also  fitted  a  logistic  dose  response  to  the  data  using  the 
SAS  nonlinear  regression  program,  PROC  NLIN.  Only  untransformed  conce¬ 
ntrations  were  used  with  the  logistic  fit.  Background  response  was 
adjusted  for  the  Abbott’s  correction,  just  as  we  did  with  the  probit 
fit.  The  following  summarizes  the  results  of  the  logistic  fit  and  co¬ 
mpare  the  inferences  with  those  based  on  the  probit  fits. 

Chi  square  for  goodness  of  fit  0.282  with  3d.f. 

Thus  there  is  no  evidence  of  lack  of  fit  of  the  model  to  the  data  and 
the  logit  and  probit  models  fit  about  equally  well. 


Point  estimates  of  percentiles  (after  adjusting  for  background) 


ProMt (Untransformed  CONC)  Probit(log  CONC)  Logit (untransformod  CONC) 


0.01 

53.385 

60.974 

45.31 

0.05 

69.946 

71.826 

68.757 

0.10 

78.774 

78.380 

79.370 

Thus , 

except  for  L  = 

0.01  where  there  is  a  bit 

of  disparity  among 

models  (although  not  of  practical  concern)  the  three  models  yield  es¬ 
sentially  the  same  percentile  estimates. 

We  now  consider  lower  confidence  bounds  on  these  same  percentiles 
based  on  the  three  parametric  fits  and  on  the  nonparametric  fit. 
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Output  from  special  purpose  program  to  calculate  lower 
bounds  on  safe  concentration  based  on  nonparametric  dose 
response  curve  fit 
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L 


Probit (Untransformed  Probit  (log  Logit (Untransformed  Non- 

CONC)  CONC)  CONC)  parametric 


0.01 

22.403 

45.764 

2.515 

3.845 

0.05 

46.147 

57.470 

38.390 

19.226 

0.10 

58. 715 

64.836 

54.526 

38.453 

Confidence  intervals  for  parametric  fits  are  based  on  95%  two  sided  non- 
simultaneous  procedures  using  Fieller's  theorem.  Nonparametric  confiden¬ 
ce  intervals  are  based  on  simultaneous  procedure  with  familywise  confid¬ 
ence  level  cf  95%. 


Comparing  the  lower  confidence  bounds  on  safe  concentration  we  see  the 
following  relations. 

1.  The  nonparametric  procedure  is  more  conservative  than  the 
parametric  procedures,  especially  at  the  very  low  percentiles 
This  is  because  the  nonparametric  procedure  is  built  on  the 
assumption  of  linear  approximations  in  the  low  concentration 
region,  while  the  logit  and  probit  fits  approach  their  limit¬ 
ing  values  through  exponential  decay.  Which  estimate  is  more 
appropriate  would  need  to  be  a  matter  of  biological  judgement 
Since  responses  in  the  region  of  concentration  for  which 

L  <  0.05  are  dominated  by  background  response,  the  data  them¬ 
selves  do  not  provide  much  empirical  evidence. 

2.  The  results  at  L  =  0.01  are  surprisingly  inconsistent  across 
model  fits.  There  is  at  least  an  order  of  magnitude  differ¬ 
ence  in  confidence  bounds  based  on  the  logit  and  probit  fits, 
despite  the  fact  that  the  point  estimates  are  in  good  agree¬ 
ment.  There  is  even  a  factor  of  two  difference  between  the 
bounds  based  on  the  two  probit  fits,  despite  good  agreement 
of  the  point  estimates.  Thus  the  inferences  at  low  percent¬ 
iles  are  very  sensitive  to  the  model  assumed.  The  nonpara¬ 
metric  procedure,  while  conservative,  is  based  on  many  fewer 
assumptions . 

In  this  example  the  probit  model  fitted  the  data  quite  nicely  and 
yielded  more  liberal  confidence  bounds  on  c  than  did  the  nonparametric 
procedure.  It  is  our  experience  that  this  Is  not  always  the  case.  In 
some  situations  the  probit  or  logit  model  does  not  fit  the  data  well. 

In  other  situations  Fieller's  method  may  yield  lower  confidence  bounds 
on  c^  which  are  negative!  We  saw  this  in  the  case  of  the  DeFoe  data. 

In  such  cases  the  nonparametric  procedure  can  provide  more  liberal 


bounds  than  the  parametric  procedures.  We  will  see  this  in  the  follow¬ 
ing  example. 


We  now  place  lower  confidence  bounds  on  the  safe  concentration  for 
the  Benoit  compound  A  fry  mortality  data.  We  first  consider  the  non- 
parametric  procedure  and  then  compare  results  with  bounds  based  on  the 
probit  model.  Figures  XVI. 2,  XVI. 3  contain  plots  of  proportions  of  dead 
fry  vs  untransformed  concentration  and  vs  log  ^(concentration)  respecti¬ 
vely.  10 

Based  on  the  appearance  of  these  plots,  a  probit  dose  response  model 
does  not  seem  to  hold  very  well,  especially  with  respect  to  untransformed 
concentration.  Furthermore,  there  is  some  question  about  homogeneity  of 
response  rates  within  tanks  at  the  highest  concentration. 

We  first  consider  the  nonparametric  procedure.  We  see  that  the  aver¬ 
age  concentration  at  the  control  group  is  0.0809  and  there  was  no  obser¬ 
ved  fry  mortality  there.  Based  on  the  plots  of  proportion  dead  fry  vs 
concentration  and  based  on  fry  mortality  proportions  printed  out  by  the 
program,  we  set  UCR,  the  upper  bound  on  the  upward  concavity  region,  to 
be  somewhere  above  the  5th  treatment  group.  In  particular  we  set  UCR  = 

15.0.  Then  c  =  cc  =  13.3182. 
r  5 

Note  that  for  the  purpose  of  illustration  we  are  assuming  that  there 
is  no  tank  to  tank  heterogeneity  within  groups.  This  assumption  needs 
to  be  checked  and  appropriate  modifications  made,  if  necessary. 

Because  of  the  many  sample  proportions  close  to  0  the  asymptotic 
normality  assumption  is  questionable  and  so  we  use  the  small  sample  con¬ 
fidence  bounds. 

Based  on  the  observed  proportions  of  dead  fry  in  the  various  groups, 

if  L  is  less  than  or  equal  to  0.10  it  makes  sense  to  choose  c^,  the  upper 

bound  on  safe  concentrations  to  be  c.  or  cr.  For  definiteness  we  choose 

4  5 

c.  here.  Thus  we  have 
4 


UCR  =15.0  k  =  4 
L  =  0.01 


L  =  0.05 


cL  =  min(Cl  +  L/Sy,  ck)  =  min(0.0809  +  0.01/0.0423,  6.6020) 
=  min(0. 0809  +  0.2364,  6.6020)  =  0.3137 

cL  =  min(c1  +  L/By,  cfc)  =  min(0.0809  +  1.1819,  6.6020) 

=  1.2628 


L  =  0.10  cL  =  min(c1  +  L/By,  cfc)  =  min(0.0809  +  2.3638,  6.6020) 


=  2.4447 


We  now  compare  these  results  with  those  based  on  the  probit  fit, 
using  SAS  PROC  PROBIT. 

First  models  including  background  variation  were  assumed.  Namely 
p(CONC)  =  c  +  c$((30  -  5)  +  B1  CONC) 
p(CONC)  =  c  +  c$((3o  -  5)  +  B1log10(CONC))  . 


The  maximum  likelihood  fitting  algorithm  was  unable  to  converge  with 
either  of  these  three  parameter  models! 

Next  the  background  rate  was  specified  to  be  0  and  two  parameter 
probit  models  were  fitted  to  the  data.  The  plot  of  proportion  dead  fry 
vs  CONC  suggests  that  the  probit  model  does  not  fit  the  untransformed 
concentration  and  in  fact  the  model  converged  to  shows  substantial  lack 
of  fit  to  the  data.  We  therefore  consider  the  probit  fit  in  log  conce¬ 
ntration.  This  fit  is  better,  but  still  exhibits  marginal  statistical 
evidence  of  lack  of  fit  (Residual  chi  square  =  8.00  with  4d.f.,  which 
is  significant  at  a  =  0.09).  Based  on  this  fit,  the  95%  lower  confi¬ 
dence  bounds  on  response  distribution  percentiles  (using  Fieller's  theo¬ 
rem  and  adjusted  by  Finney's  heterogeneity  factor)  are: 


0.01 

cL  =  0.082 

0.05 

cL  =  0.523 

0.10 

c.  =  1.362 
■L 

Thus  in  this  example  these  bounds  are  lower  than  those  based  on  the  non- 
parametric  procedure.  They  are  also  based  on  a  much  more  restrictive 
model. 


The  nonparametric  procedure  seems  quite  superior  in  this  example 
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XVII.  ANALYSIS  OF  QUANTITATIVE  RESPONSES 


Most  of  the  preceeding  discussion  has  been  concerned  with  aspects 
of  the  analysis  of  quantal  survival  data.  This  includes  preliminary 
graphical  displays,  tests  for  heterogeneity  among  tanks  within  groups, 
outlier  detection  procedures,  adjustments  for  tank  to  tank  heterogeneity, 
analysis  of  variance  and  multiple  comparison  procedures,  and  various 
types  of  dose  response  estimation  procedures. 

Quantitative  responses  such  as  length  and  weight  are  also  recorded 
during  and  at  the  conclusion  of  toxicity  tests.  Weight  measurements 
(in  mgs)  on  all  surviving  fish  at  the  conclusion  of  early  life  stage 
tests  are  standard.  The  statistical  analysis  of  such  responses  is  dis¬ 
cussed  in  this  section.  This  discussion  is  also  directly  relevant  to 
the  statistical  analysis  of  the  length  and  weight  measurements  that  are 
taken  at  periodic  intervals  (e.g.  at  30  day  or  60  day  intervals)  in  full 
life  cycle  tests. 

The  approach  to  analyzing  weight  and  length  data  is  directly  analagous 
to  the  approach  to  analyzing  survival  data.  Each  aspect  of  analyzing 
quantal  data,  mentioned  above,  has  a  direct  counterpart  for  analyzing 
quantitative  data.  In  fact  the  procedures  appropriate  for  analyzing 
quantitative  responses  are  more  "standard"  and  more  familiar  to  most  users 
of  statistical  methodology  than  those  appropriate  for  analyzing  quantal 
responses . 

In  addition  to  weight  and  length  data  a  quantitative  response  often  re¬ 
corded  and  analyzed  in  mammalian  toxicology  tests  is  time  to  death  (or 
time  to  tumor  or  time  to  any  apriori  specified  event).  Such  time  to 
death  data  provide  more  information  than  the  30  day,  60  day,  120  day, 
etc.,  survival  rates  that  are  commonly  reported  in  aquatic  toxicity  tests. 
In  particular  knowledge  of  time  to  death  of  each  embryo  or  fry  yields  the 
percent  survival  responses  as  a  byproduct.  However  30  day  survival  data 
will  not  reveal  whether  the  fish  that  died  did  so  on  day  1,  day  15,  or 
day  29.  Such  information  is  important  for  understanding  the  mechanisms 
by  which  the  toxicants  act. 

The  analysis  of  time  to  death  data  involves  working  with  censored 
responses.  Parametric  and  nonparametric  approaches  to  the  analysis  of 
such  data  are  discussed  in  a  number  of  books  such  as  Gross  and  Clark  [49] 
and  Kalbf leisch and  Prentice  [50] . 

Time  to  death  (to  the  nearest  day)  data  is  usually  collected  as  part 
of  the  day  to  day  test  procedure  since  the  tanks  are  examined  daily  (or 
at  least  on  weekdays)  for  dead  fish  and  these  are  removed  and  recorded. 
Unfortunately,  time  to  death  data  is  not  routinely  reported  as  part  of 
the  experimental  results.  In  particular  time  to  death  of  individual 
fish  was  not  included  in  any  of  the  data  sets  made  available  to  us. 

As  such,  the  analysis  of  censored  life  data  is  not  discussed  here. 


We  would  make  a  strong  recommendation  that  time  to  death  of  indivi¬ 
dual  fish  be  routinely  reported  in  the  future  instead  of  or  in  addition 
to  30  day,  60  day,  etc.,  percent  survival  data.  This  would  require  little 
additional  cost  or  effort  but  could  possibly  provide  valuable  additional 
information. 

In  the  remainder  of  the  section  we  consider  the  various  aspects  of 
analyzing  the  weights  recorded  on  survivors  of  30  day  early  life  stage 
tests.  Before  getting  down  to  the  technical  and  methodological  issues, 
several  conceptual  points  should  be  discussed.  The  primary  difficulty 
in  the  interpretation  of  weight  data  is  the  confounding  of  weight  gain 
with  survival.  A  number  of  scenarios  can  be  postulated  leading  to 
different  conclusions  about  the  relationships  to  be  anticipated.  Death 
can  be  thought  of  as  the  first  order  effect  of  the  toxicant  and  weight 
loss  (or  lesser  weight  gain)  as  a  secondary  effect.  If  death  and  weight 
loss  represent  different  degrees  of  severity  of  the  same  mechanism  then 
one  would  expect  that  average  weights  of  survivors  would  decrease  as  the 
mortality  rate  increases  and  lack  of  observation  of  such  a  decrease 
might  be  interpreted  as  lack  of  effect.  However,  since  weights  are 
measured  just  on  survivors  a  selection  phenomenon  may  be  occurring. 
Presumably,  in  the  various  treatment  groups  the  stronger  fish  survive 
while  the  weaker  fish  die.  Presumably  the  weaker  fish  would  have  gained 
less  weight  on  average  than  the  stronger  fish,  had  they  survived  (e.g. , 
if  they  were  in  the  control  group) .  Since  greater  numbers  of  weak  fish 
survive  in  the  control  and  low  treatment  level  groups  than  in  the  higher 
treatment  level  groups,  these  weak  survivors  might  decrease  the  average 
weight  gain  relative  to  the  strong  survivors  in  the  treatment  groups. 

Thus  an  increase  in  observed  average  weight  with  treatment  level  might 
be  possible,  or  if  the  toxicant  reduces  weight  gain  in  the  treatment 
groups,  the  selection  and  reduction  effects  may  offset  one  another, 
therby  resulting  in  no  observable  trend.  Therefore  the  biological 
meaning  of  observed  trends  in  weight  gain  with  increasing  concentration 
or  the  lack  of  observed  trends  depends  very  much  on  the  biological  assum¬ 
ptions  about  toxicant  mechanisms  and  about  association  between  survival 
and  weight  loss. 

One  way  to  reduce  or  eliminate  the  confounding  of  the  survival  and 
weight  gain  responses  is  to  confine  weight  gain  comparisons  to  those  con¬ 
centration  groups  whose  mortality  rates  are  not  significantly  (either 
biologically  or  statistically)  greater  than  that  in  the  control  groups. 

The  rationale  for  this  viewpoint  is  that  mortality  is  a  first  order 
effect  while  weight  gain  is  a  second  order  effect.  Thus  in  groups  with 
significant  mortality,  the  question  of  reduction  in  weight  gain  is  not 
of  concern.  Only  when  the  mortality  rate  approximates  that  in  the  control 
group  is  the  comparison  of  weight  gains  important. 

We  illustrate  the  analysis  of  the  weight  responses  with  fry  data  from 
the  Holcombe  and  Phipps  test  on  compound  D.  Recall  that  the  observed 
mortality  rates  in  the  control  group  and  the  five  treatment  groups  were 
0.06,  0.08,  0.08,  0.13,  0.79,  1.00.  It  was  sb^wn  in  Section  XII  that  the 
mortality  rates  in  groups  5  and  6  differ  significantly  from  that  in  the 
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control  group  and  group  4  is  borderline  (statistically)  significant 
according  to  Williams'  procedure.  Thus  it  might  be  reasonable  to  confine 
weight  gain  comparisons  with  the  control  group  to  treatment  groups  2,  3, 
and  possibly  4. 

A.  Preliminary  Scatterplots 

A  number  of  plots  were  prepared  showing  the  means  and  standard  deviations 
of  weight  by  tank  plotted  against  group  number,  concentration,  log  con¬ 
centration,  proportion  of  survivors,  and  other  variables.  Several  of 
these  plots  are  shown  in  Figures  XVII. 1  to  XVII. 4.  Figure  XVII. 1  shows 
average  weight  per  tank  vs  group  number.  Group  6  does  not  appear  in  this 
plot  since  it  had  100  percent  mortality.  A  downward  trend  in  average 
weight  with  increasing  group  number  is  evident.  Note  that  the  total 
range  of  variation  in  average  weights  is  not  very  great  —  between  100 
mg  and  145  mg.  Figure  XVII. 2  shows  standard  deviation  of  weight  per 
tank  vs  group  number.  With  the  exception  of  one  tank  in  group  5  there 
appears  to  be  no  trend  in  standard  deviation  with  group.  Note  that  the 
standard  deviation  estimates  in  group  5  are  less  stable  than  those  in  the 
other  groups  since  they  are  based  on  many  fewer  observations.  Figures 
XVII. 3,  XVII. 4  show  the  average  weight  per  tank  vs  concentration  and  vs 
log  concentration  respectively  (more  specifically  log(l  +  CONC)).  The 
decreasing  trend  in  average  weight  with  increasing  concentration  is 
again  evident.  In  Figure  XVII. 4  a  linear  or  quadratic  trend  in  log  con¬ 
centration  can  be  seen  among  the  treatment  groups. 


Outlier  Detection  Procedures  and  Testing 
neity  Within  Groups 


for  Tank  to  Tank  Heteroge- 


Analysis  of  variance  models  were  fitted  to  the  individual  weights 
and  logarithmic  weights  to  determine  if  there  is  any  statistical  evidence 
of  tank  to  tank  heterogeneity  within  groups  or  of  differences  in  average 
weights  across  groups.  The  two  way  mixed  model 


=  y  +  a. 


+  Tj(i) 


i  =  1,  . 
k  =  1,  . 


..,1  j  1  ,  •  •  •  ,  vJ 


was  specified  where  W^j^  corresponds  to  the  weight  or  to  the  log  weight 
of  the  k-th  fish  within  the  j-th  tank  of  the  i-th  group,  a ^  is  the  fixed 
group  effect,  T j n )  is  the  random  effect  of  the  j-th  tank  within  the  i-th 
group,  and  is  the  experimental  variation.  It  is  assumed  that  Tj(i) 

are  independent  N(0,  and  are  independent  N(0,  a^)  and  the 

are  independent.  In  the  case  of  the  Holcombe  and  Phipps 

j  \i/  ’  ijk 

Compound  D  fry  mortality  data  I  =  5,  J  =  4,  n^j  varies  with  tank  and  with 
tank  and  with  group  but  is  nearly  constant  in  groups  1  to  4.  The  model 
was  fitted  to  the  data  using  PROC  GLM  in  the  SAS  statistical  computing 
system  [12].  The  results  are  shown  in  Figures  XVII. 5  to  XVII. 7. 
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Figures  XVII. 5  and  XVII. 6  show  the  analysis  of  variance  tables  for 
the  responses  weight  and  log  weight  respectively.  Figure  XVII. 7  shows 
the  expected  mean  squares  of  the  entries  in  the  analysis  of  variance 
tables  in  Figures  XVII. 5  and  XVII. 6.  The  conclusions  from  the  fits  to 
the  untransformed  weights  and  to  the  logarithmic  weights  are  very  simi¬ 
lar.  From  the  expected  mean  square  of  TANK  (GRP)  indicated  in  Figure 
XVII. 7  it  is  seen  that  the  hypothesis  Hq:  Oj  =  0  is  tested  by  comparing 
the  TANK  (GRP)  mean  square  with  the  error  mean  square.  The  resulting  F 
tests  are  nonsignificant  for  both  the  untransformed  and  logarithmic 
weight  responses  (observed  significance  levels  0.64,  0.70  respectively). 
The  estimated  variance  components  are: 

untransformed  weights 

o2  =  (-1-3-f5"6"  "  900.35^/19.19  =  -7.58 

logarithmic  weights 

a2  •  (-^5 36  ~  0.097^ /19. 19  =  -0.001 

aO 

Thus  in  both  cases  aT  is  set  equal  to  zero.  Therefore  in  this  example 
the  error  mean  square  may  be  used  as  an  error  yardstick  against  which 
to  compare  the  fixed  effect  mean  squares  for  group  effects. 

o 

In  general  aT  will  not  be  equal  to  zero  and  so  an  appropriate  error 
yardstick  will  be  a  linear  combination  of  the  error  mean  square  and  the 
tank  (group)  mean  square.  To  see  how  this  works  consider  the  test  of  the 
hypothesis  Hq:^  =  0,  that  is  no  group  effects.  This  null  hypothesis  is 
obviously  false  and  the  test  given  by  PROC  GLM,  based  on  the  error  mean 
square  with  366  d.f.  rejects  Hq  very  strongly.  However  the  error  mean 
square  underestimates  the  variability  of  the  group  mean  square  if  a \  is 
is  greater  than  zero.  The  type  IV  expected  mean  square  for  group  is  shown 
in  Figure  XVII. 7  to  be  a|  +  18.2048a2.  Thus  the  tank  (group)  mean  square 
with  15  d.f.  is  a  more  appropriate  error  yardstick  than  the  error  mean 
square  estimates  a|.  In  general  a  linear  combination  of  these  two  mean 
squares  would  be  an  even  better  yardstick. 

The  classical  approach  to  combining  expected  mean  squares  is  based 
on  choosing  that  linear  combination  which  yields  an  unbiased  estimator  of 
Og  +  18.2048a*.  Namely 

w[a^  +  19.19150^]  +  (1  -  w )a2  =  a2  +  18.2048a^ 
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w 


18.2048 

19.1915 


0.95 


Note  that  w  does  not  depend  on  a  . 

If  the  design  had  been  completely  balanced,  this  would  have  led  to  using 
the  tank  (group)  mean  square  with  15d.f.  irrespective  of  the  value  of  Oj 
Such  an  approach  is  analagous  to  carrying  out  analyses  on  a  per  tank 
basis  rather  than  on  a  per  fish  basis. 

For  the  untransformed  weights 

.95  MS  TANK  (GRP)  +  .05  MS  ERROR  =  .95(754.97)  +  .05(900.35)  =  762.26 
For  the  logarithmic  weights 

.95  MS  TANK  (GRP)  +  .05  MS  ERROR  =  .95(0.0757)  +  .05(.0972)  =  0.0768 

To  calculate  the  effective  number  of  degrees  of  freedom  of  this  linear 
combination,  assume  that 

.95  MS  TANK  (GRP)  +  .05  MS  ERROR  ^  (a2  +  18. 2048c/) y2/v 

where  V  is  unknown.  Equating  the  variances  of  the  two  sides  we  obtain: 

22  222  222  2  222 
(•  95)  (</  +  19.19150*)^  +  (.05)V  ^  -  (a‘  +  18. 2048c/) Z  £ 

Approximating  the  expectations  by  the  mean  squares  yields 

(a) untransformed  weights 

(.95)2(754.97)2  +  (. 05) 2 (900. 35) 2  ~  =  (762. 26)2  £ 

or  v  =  16.9 

(b)  logarithmic  weights 

(.95)2(.0757)2  ±5  +  (.05)2(.0972)2  =  (.0768)2  ± 

or  v  =  17.1 

Comparison  of  the  type  IV  group  mean  square  with  this  error  yardstick 
vields 


untransformed  weights:  7649.96/762.26  =  10.04 
logarithmic  weights:  0.6743/0.0768  =  8.78 

Although  in  the  case  of  unbalanced  data  the  numerator  and  denominator  mean 
squares  are  not  strictly  independent  and  the  denominator  "mean  square" 
is  not  strictly  distributed  as  chi  square,  an  approximate  test  is  usually 
constructed  by  treating  these  ratios  as  F  ratios  with  degrees  of  freedom 
4  and  17.  These  ratios  are  of  course  very  highly  statistically  signifi¬ 
cant  according  to  this  yardstick.  Thus  there  is  very  strong  statistical 
evidence  of  differences  in  average  weights  among  groups. 

It  should  be  noted  that  this  approach  of  using  essentially  the  tank 
(group)  mean  square  with  15d.f.  as  error  yardstick  is  very  close  to 
carrying  out  group  to  group  comparisons  on  a  per  tank  basis  rather  than 
on  a  per  fish  basis.  Figures  XVII. 8  shows  the  output  from  a  one  way 
analysis  of  variance  to  compare  group  means  using  the  tank  means  as 
basic  input  data.  This  corresponds  to  a  quantification  of  the  relations 
seen  in  Figure  XVII. 1.  Again  the  group  effects  are  highly  significant. 

The  expected  mean  square  for  tank  (group)  as  shown  in  Figure  XVII. 7  is  as 
if  each  tank  contained  19.19  fish  on  average.  (The  average  is  actually 
19.3).  If  we  divide  MS  TANK  (GRP)  by  19.19  we  obtain  39.34  which  agrees 
quite  well  with  the  error  mean  square  of  39.40  in  Figure  XVII. 8. 

An  alternative  approach  to  pooling  mean  squares  in  the  analysis  of 
variance  is  based  on  finding  that  linear  combination  of  tank  (group)  and 

error  mean  squares  which  minimizes  the  mean  square  difference  from 
a2  +  18.2048a2.  Namely  choose  w  so  that 

E [w  MS  TANK  (GRP)  +  (1  -  w)MS  ERROR  -  (a2  +  18.2048a2)]2 

e  X 

is  minimized.  The  resulting  choice  of  w  will  depend  on  the  relative 
magnitudes  of  a2  and  a2.  As  a2/a|  approaches  zero,  more  and  more  emphasis 
will  be  placed  on  the  error  mean  square  because  its  reduced  variance  will 
more  than  compensate  for  its  bias.  The  above  expectation  can  be  calculated 
to  be 

W2(a2  +  19.19a2)2  —•  +  (1  -  w)2a4  _2_  +  [(19.19w  -  18.2048)o2]2 
e  T  15  e  366 

We  wish  to  choose  w(0^w<l)  to  minimize  this  expression.  If  we  substitute 
in  the  estimates  of  the  mean  squares  and  of  the  variance  components  we 
obtain 

w2(754.97)j|  +  (1  -  w) 2  (900. 35)  +  0  =  75997. 3w2  +  4429.7(1  -  w)2 

which  has  its  minimum  in  the  interval  0<w<l  at  w  =  0.055.  Thus  the  second 
approach  leads  to  the  error  yardstick 


.055  MS  TANK  (GRP)  +  .945  MS  ERROR  =  892.35 
with  approximate  degrees  of  freedom  obtained  by  solving  the  equation 

(.055)2(754.97)2  ~  +  (.945)2(900. 35)2  =  (892 . 35)  2  - 

iJ  3bb  V 

or  v  =  380. 

Thus  this  allocation  effectively  leads  to  the  use  of  the  error  mean  square 
in  this  case  and  is  very  different  from  that  obtained  by  equating  expected 
mean  squares.  This  alternative  pooling  scheme  is  most  useful  when  there 
are  an  inadequate  number  of  degrees  of  freedom  for  estimating  MS  TANK  (GRP) 

because  it  then  puts  more  weight  on  the  error  mean  square.  The  criterion 
used  does  not  of  course,  insure  that  the  resulting  mean  square  is  an 

unbiased  estimate  of  +  18.2048a^. 

e  ^ 

It  should  be  noted  that  the  GLM  procedure  permits  the  decomposition 
of  the  model  sum  of  squares  into  individual  degree  of  freedom  components. 
This  feature  is  illustrated  in  Figure  XVII. 8.  The  linear  and  quadratic 
components  of  trend  are  defined  by  the  contrasts  (-2,  -1,  0,  1,  2)  and 
(2,  -1,  -2,  -1,  2)  respectively.  These  contrasts  single  out  physically 
important  comparisons  among  the  groups  to  test  and  estimate  and  thereby 
increase  the  sensitivity  of  the  analysis  of  variance  tests.  This  approach 
is  analagous  to  carrying  out  a  one  sided  measure  of  association  test  with 
qualitative  survival  data. 

The  residuals  from  the  analysis  of  variance  fits  can  be  used  to  check 
distributional  assumptions  and  to  detect  outliers.  Figures  XVII. 9  -  XVII. 12 
display  the  arithmetic  and  logarithmic  residuals  from  the  fits  in  Figures 
XVII. 5  and  XVII. 6  respectively.  Figures  XVII. 9  and  XVII. 10  show  the  resi¬ 
duals  plotted  vs  group.  No  outliers  are  obvious.  The  variability  seems 
constant  with  group.  The  residuals  from  the  un transformed  weights  appear 
to  be  much  more  symmetric  about  zero  than  those  from  the  logarithmic 
weights.  Figures  XVII. 11  and  XVII. 12  show  normal  probability  plots  of  the 
arithmetic  and  logarithmic  residuals  respectively.  The  plot  in  Figure 
XVII. 11  looks  much  more  nearly  normal  than  that  in  Figure  XVII. 12. 

The  lowest  two  residuals  in  Figure  XVII. 11  lie  below  the  line  through 
the  remainder  of  the  data.  To  determine  whether  there  is  any  statistical 
evidence  that  these  observations  are  outliers  we  can  test  whether  the  most 
extreme  of  386  independent  normally  distributed  random  variables  with  mean 
0  and  standard  deviation  30  is  likely  to  exceed  118  in  absolute  value. 

(The  two  extreme  residuals  correspond  to  observations  209  and  220,  are 
from  group  3,  tanks  A  and  B,  have  values  -112.9  and  -117.9,  and  are 
associated  with  fish  having  reported  weights  of  15  mgs  and  10  mgs.  I  am 
assuming  that  these  are  the  correct  weights,  but  this  should  be  checked). 

P[most  extreme  of  386  observations  greater  than  118  in  absolute  value]  = 

1  -  [P(-118<X<118)  ]386  =  1  -  -  l]386  =  1  -  ( . 999950) 386 

=  1  -  .98  =  .02 


Thus  there  is  statistical  evidence  that  this  extreme  residual  does  not 
conform  to  the  others.  Whether  this  represents  a  clerical  error  or 
natural  biological  variation  would  need  to  be  determined. 

Assuming  that  the  extreme  observation  is  an  outlier,  the  second 
most  extreme  observation  can  be  compared  to  the  extreme  of  385  obser¬ 
vations  . 

P[most  extreme  of  385  obsvns  greater  than  113  in  absolute  value]  = 

1  -  [P(-113<X<113)]385  =  1  -  ~  11385  =  1  -  ( •  99990) 385 

=  0.037 

There  is  thus  statistical  evidence  that  this  second  most  extreme  obser¬ 
vation  is  also  an  outlier. 

Basic  records  should  be  examined  to  determine  if  these  observations 
are  valid.  If  not,  they  should  be  corrected  or  deleted  and  the  modified 
data  reanalyzed.  If  they  in  fact  represent  natural  biological  variation 
then  biological  judgement  should  be  used  to  determine  whether  or  not  to 
retain  these  observations  with  the  remainder. 

C.  Multiple  Comparison  Procedures  and  Regression  Analyses 

Based  on  the  results  of  the  analysis  of  variance  calculations  pre¬ 
viously  discussed,  we  can  carry  out  comparisons  of  average  weight  gains 
across  groups.  The  average  weight  gains  and  numbers  of  animals  per 
group  are: 

Group  12345 

N  94  92  92  87  21 

Average  131.2  135.59  127.9  113.1  108.1 

The  standard  errors  of  these  averages  based  on  averaging  the  responses 
from  four  tanks  and  varying  numbers  of  fish  per  group,  are 

1/4(0“*  +  o*/ 23.5),  1/4(0*  +  cr*/23)  ,l/4(a*  +  a*/23) ,  l/4(o*  +  a*/21.75), 
1/4(0*  +  a\l 5.25).  The  variance  components  a*,  a*  can  be  estimated  by 

appropriate  linear  combinations  of  the  tank  (group)  and  error  mean  squares 
displayed  in  Figure  XVII. 5.  For  the  Holcombe  and  Phipps  fry  mortality 
data  o*  =  0  and  so  we  can  estimate  the  standard  errors  using  the  error 
mean  square. 

Thus 


a2  =  900.355  with  366  d.f. 

e 

Alternative  standard  error  estimates  with  appropriate  degrees  of  freedom 
can  be  constructed  using  approaches  analagous  to  those  discussed  in  the 
previous  subsection. 
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We  apply  Williams  procedure  [36]  to  determine  which  groups  have 
(statistically)  significantly  lower  weight  gain  than  the  control  group. 

We  first  need  to  ad'ujt  these  mean  values  so  that  they  are  in  monotone 
decreasing  sequence.  We  simply  calculate  weighted  average  (94  x  131.2 
+  92  x  135.59)/186  =  133.37  of  the  averages  in  groups  1  and  2.  The  modi¬ 
fied  averages  are  now  in  monotone  decreasing  order.  We  declare  the  group 
i  average  weight  gain  to  be  significantly  smaller  than  the  control  average 
if 


X.  -  X  <  -t a  (l/N.  +  1/N  ) 

i,adj  1  e  l  1 


1/2 


Note  that  X^  a<jj  is  the  adjusted  average  whereas  X^  is  the  unadjusted 
average.  We  use  the  factor  t  obtained  from  Williams’  table,  which  is 
derived  under  the  assumption  of  equal  group  sample  sizes.  This  assumption 


is  quite  reasonable  for  groups  1-4.  We  assume  that  o£  is  estimated  with 


366  d.f.  A  more  conservative  assumption  might  be  use  15  d.f.  since  this 


is  the  amount  of  information  concerning  Of, 
to  1.88. 


This  would  raise  t  from  1.75 


The  group  i  mean  is  declared  to  be  significantly  lower  than  the 
control  mean  if 


,1/2 


X.  <  X  -  t a  (l/N.  +  1/N.)  '  =  131.2  -  (1. /30) (30.01) (1/N. 

l ,  adj  1  e  l  1  i 


+  1/NX) 


1/2 


i  =  2: 

X2,adj 

=  133.37 

critical 

value  =  123.50 

i  =  3: 

X3,adj 

=  127.9 

critical 

value  =  123.50 

i  =  4 : 

X.  ,. 
4  ,adj 

=  113.1 

critical 

value  =  123. 39 

i  =  5: 

X,  ,. 
5  ,adj 

=  108.1 

critical 

value  =  118.52 

Thus  the  average  weight  gains  in  groups  4  and  5  are  declared  by  Williams’ 
procedure  to  be  (statistically)  significantly  lower  than  the  control  group 
average  at  the  5  percent  level.  With  respect  to  mortality,  group  5  is 
obviously  much  different  than  the  control  group  and  group  4  is  border¬ 
line  (statistically)  significantly  different.  Thus  the  quantal 
survival  and  quantitative  weight  responses  yield  essentially  the  same 
conclusions . 

We  can  fit  regression  models  to  the  weight  gain  data  to  quantify 
the  trends  in  averages  across  groups.  Figure  XVII. 4  shows  average  weight 
gain  plotted  against  log(l  +  CONC) .  The  responses  in  the  four  treatment 
groups  show  a  definite  trend,  mostly  linear  but  possibly  with  some  second 
order  curvature.  Figure  XVII. 13  shows  the  results  of  fitting  a  cubic 


polynomial  in  log  (1  +  CONC)  to  the  treatment  groups  and  an  indicator 
function  to  the  control  group.  Namely  the  model 

W,.,  =  $A  +  3,  ICTL  +  8-  LCONC  +  80  LCONC2  +  8,  LCONC3  + 

ijk  0  1  2  3  4  ijk 

was  fitted  to  the  weight  data  where  ICTL  =  1  if  GRP  =  1  and  0  otherwise 
and  LCONC  =  log(l  +  CONC).  This  model  fits  a  cubic  polynomial  to  the 
treatment  groups.  The  parameter  Bp  represents  the  difference  between 
the  control  group  mean  and  the  extrapolation  back  to  LCONC  =  0  along  the 
cubic  polynomial.  The  contrasts  estimated  at  the  bottom  of  the  figure 
correspond  to  the  differences  between  the  mean  responses  at  the  treat¬ 
ment  groups,  based  on  the  polynomial  fit,  and  the  control  group  response. 

A  complication  in  inference  procedures  arises  if  there  is  tank  to 
tank  heterogeneity  within  groups.  Observations  within  the  same  tank  are 
then  dependent  due  to  a  common  tank  effect.  The  variation  of  the  type 

IV  mean  square  for  GRPS  is  seen  in  Figure  XVII. 7  to  be  inflated  from 
2  2  2 

cre  to  CTe  +  18.20480^  due  to  such  heterogeneity.  The  standard  errors  in 
Figure  XVII. 13  might  then  be  inflated  by  the  factor  [1  +  18. 2048a? /Og]1^ 
to  account  for  such  heterogeneity.  The  quantity  a|  +  18.2048a|  can  be 
estimated  by  pooling  mean  squares  in  Figure  XVII. 5  in  at  least  one  of 
two  different  ways,  as  discussed  in  the  previous  subsection.  These 
yields  estimates  of  726.26  with  17  d.f.  or  892.35  with  380  d.f.  Alter¬ 
natively,  regression  fits  can  be  carried  out  on  a  per  tank  basis,  as  is 
commonly  done.  This  turns  out  to  be  very  similar  to  using  the  17d.f. 
variance  estimate. 

Since  0 ^  was  nonsignificant  in  the  previous  ANOVA  fit,  since  0T  =  0, 
and  since  the  variance  estimate  with  380  d.f.  is  very  similar  to  the 
error  mean  square  in  Figure  XVII. 13,  we  will  use  the  error  mean  square 
as  the  basis  of  standard  error  calculations  in  this^example .  It  should 
be  noted  however  that  this  is  appropriate  only  if  aT  =  0. 

We  see  from  the  type  I  sums  of  squares  in  Figure  XVII. 13  that  the 
linear  component  of  trend  is  highly  significant  while  the  quadratic  and 
cubic  trends  are  nonsignificant  over  and  above  the  linear  trend.  This 
agrees  with  the  appearance  of  Figure  XVII. 4.  The  quadratic  and  cubic 
terms  should  be  deleted  and  the  model  refitted.  The  contrasts  at  the 
bottom  of  Figure  XVII. 13  show,  in  agreement  with  the  results  from 
Williams'  procedure,  that  groups  4  and  5  differ  significantly  from  the 
control  group  while  groups  2  and  3  do  not. 
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Compound  D  fry  mortality  data 


Figure  XVII. 9  Residuals  vs  group 
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Figure  XVII. 10  Logarithmic  residuals  vs  group 
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Figure  XVII. 11  Normal  probability  plot  of  residuals  from  un transformed  weights 
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Figure  XVII. 12  Normal  probability  plot  of  residuals  from  logarithmic  weights 
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XVIII.  EXPERIMENTAL  DESIGN  CONSIDERATIONS 

In  this  section  we  consider  a  number  of  issues  pertaining  to  the 
design  and  conduct  of  aquatic  toxicity  tests  such  as  precision  to  be 
expected  as  a  function  of  sample  size,  allocation  of  tanks  among  treat¬ 
ment  groups,  additional  variables  to  measure,  numbers  of  tanks  to  be  run. 

A.  Assumptions,  Additional  Variables  to  Measure,  Numbers  and  Allocations 


of  Tanks 

We  briefly  discuss  several  assumptions  and  recommendations  associated 
with  planning  toxicity  tests. 

It  is  assumed  that  the  constraint  on  size  of  test  is  the  number  of 
fish  tanks  that  can  be  run  within  cost,  manpower,  and  apparatus  limita¬ 
tions.  The  cost  of  running  a  test  is  assumed  to  be  directly  proportional 
to  the  number  of  tanks  run.  Once  a  tank  is  put  on  test,  the  fish  a~e 
essentially  free.  Thus  a  sample  size  strategy  for  early  life  stage  tests 
is  to  run  as  many  tanks  as  can  be  afforded  and  fill  each  tank  with  the 
maximum  number  of  embryos  or  fry  that  is  biologically  sensible.  This  may 
differ  for  full  life  cycle  tests. 

A  sufficient  number  of  fish  tanks  should  be  included  in  the  tests 
to  be  able  to  detect  the  presence  of  tank  to  tank  heterogeneity  within 
groups.  If  we  have  enough  degrees  of  freedom  among  tanks  within  groups, 
the  need  for  pooling  mean  squares  in  the  analysis  of  variance  in  order 
to  improve  the  sensitivity  of  tests  or  estimates  of  treatment  effects 
is  diminished.  We  can  analyze  the  data  on  a  per  tank  basis  if  we  wish 
to,  which  is  appropriate  whether  or  not  tank  to  tank  heterogeneity  is 
present.  The  main  difficulty  with  per  tank  analyses  is  when  the  lack  of 
adequate  degrees  of  freedom  diminishes  precision  of  inference.  We  recommend 
that  there  be  at  least  12  d.f.  to  estimate  tank  to  tank  variation  within 
groups.  This  would  correspond  to  an  average  of  3  tanks  per  group  if  6 
treatment  groups  were  being  run.  However  to  account  for  the  possibility 
zero  percent  mortality  in  the  control  group  and  100  percent  mortality  in 
the  highest  treatment  group,  it  is  suggested  that  4  replicate  tanks  per 
group  be  run.  This  would  provide  12d.f.  for  estimating  variability  just 
based  on  the  results  of  the  four  intermediate  groups.  A  glance  at  the 
charts  of  the  noncentral  t  or  noncentral  F  distribution  shows  that  the 
power  of  analysis  of  varaince  type  tests  based  on  12d.f.  is  nearly  as  great 
as  that  with  the  infinite  number  of  degrees  of  freedom.  For  example  for 
a  =  0.05  and  one  d.f.  in  the  numerator 


noncentrality  (0)  1.5 


power,  12  d.f.  0.50 


0.945 


0.972 


The  differences  in  power  are  of  little  qualitative  importance. 


An  important  distinction  should  be  recognized  between  the  numbers  of 
tanks  needed  to  estimate  the  variability  of  treatment  group  responses  as 
opposed  to  the  numbers  of  tanks  needed  to  reduce  the  tank  to  tank  varia¬ 
bility.  Suppose  that  we  run  J  tanks  per  treatment  group,  n  fish  per  tank, 
and  that  Oj,  0g  represent  the  components  of  variation  between  tanks  and 
between  fish  within  tanks  respectively.  Then  the  variance  of  a  treat¬ 
ment  group  average  is 


If  crT  is  large  relative  to  ae/n,  the  only  way  to  reduce  the  variance  is 
by  increasing  J.  However  for  fixed  j,  the  ability  to  estimate  this 
variance  with  12  d.f.  yields  nearly  as  much  sensitivity  of  tests  and  con¬ 
fidence  intervals  as  an  estimate  with  infinite  d.f.  This  is  also  reflected 
in  the  fact  that  the  upper  97.5  percentile  of  the  t  distribution  is  2.78 
with  4  d.f.,  2.18  with  12  d.f.,  and  1.96  with  infinite  d.f.  Thus  12  d.f., 
is  most  of  the  way  between  4  d.f.,  and  infinite  d.f. 

It  should  be  noted  that  while  we  recommend  at  least  12  d.f.,  for 
estimating  variablity,  the  tanks  do  not  necessarily  need  to  be  equally 
replicated  across  groups.  In  fact  we  will  suggest  an  unequal  allocation 
later  in  this  section. 

Another  important  aspect  of  planning  experiments  is  specifying  classes 
of  variables.  At  least  six  classes  of  variables  can  be  distinguished: 
responses  to  be  measured,  controlled  experimental  variables,  blocking 
variables,  variables  to  be  held  constant,  covariates  to  be  measured,  and 
variables  to  be  randomized  over.  Rather  than  present  a  detailed  discussion 
of  each  class  of  variable,  we  will  emphasize  those  aspects  which  either 
vary  from  practice  or  are  less  obvious.  In  addition  to  the  responses 
currently  measured,  it  was  argued  in  the  previous  section  that  individual 
times  to  death  should  be  reported.  This  response  must  be  measured  but 
usually  is  neither  reported  nor  analyzed.  Obvious  blocking  variables  are 
fish  tank  or  test  series.  Other,  less  obvious  blocking  variables  that 
might  be  incorporated  into  investigations  are  homogeneous  subsets  of  fish 
(e.g.,  offspring  of  common  parents,  fish  raised  in  the  same  breeding 
chamber,  fish  purchased  from  a  single  supplier  at  a  single  time,  etc), 
investigators,  laboratories,  time  period  when  test  was  conducted,  tech¬ 
nician,  and  many  others.  Some  of  the  latter  blocking  factors  would  most 
naturally  occur  in  round  robin  tests.  In  any  test  program  a  number  of 
variables  are  held  constant,  at  least  nominally.  Examples  are  water 
temperature,  pH,  hardness,  levels  of  additives  or  impurities;  type,  amount, 
and  frequency  of  food;  type  of  fish  tank;  photoperiod.  All  these  variables 
must  be  reported  with  the  experimental  results  so  that  experimental  con¬ 
ditions  can  be  repeated  and  results  compared  across  laboratories.  Diffe¬ 
rences  in  variables  held  constant  will  sometimes  account  for  discrepancies 
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in  results.  Covariates  are  factors  which  cannot  be  controlled  but  which 
can  be  measured  and  taken  into  account  when  analyzing  the  data.  Covariates 
commonly  reported  are  deviations  from  nominal  in  either  controlled  experi¬ 
mental  variables  or  in  variables  held  constant.  The  most  obvious  covariate 
is  actual  test  concentration.  This  should  be  determined  periodically  in  each 
tank  and  reported.  The  analysis  of  the  data  should  be  based  on  actual 
toxicant  concentrations  rather  than  nominal.  The  question  of  how  to 
summarize  toxicant  levels  has  biological  as  well  as  statistical  aspects. 

For  example  is  effective  level  the  average,  the  median,  the  maximum  or 
some  other?  The  question  of  frequency  of  measurement  pertains  to  the 
short  term  effects  of  fluctuations  in  levels.  The  greater  the  effect  of 
short  term  fluctuations  in  toxicant  levels ,  the  more  frequently  they  need 
be  measured.  This  aspect  will  not  be  considered  further  here.  Other 
covariates  to  be  measured  and  used  for  analysis  might  be  water  temperature, 
hardness,  or  pH,  and  measures  of  the  size  or  health  of  the  brood  stock  from 
which  the  test  fish  were  taken.  The  remaining  class  of  variables  —  those 
to  be  randomized  over  —  is  perhaps  the  most  numerous.  However  the  variables 
thought  to  be  most  important  were  included  in  the  other  five  categories. 

These  variables,  many  of  which  are  not  explicitely  known,  are  randomized 
over.  Their  effects  thus  enter  into  the  experimental  variability.  It  is 
hoped  that  their  effects  are  not  too  great. 


B.  Sample  Size  and  Power  Considerations  for  Quantal  Survival  Data 

We  first  assume  that  there  is  no  tank  to  tank  variation  within 
groups.  We  later  modify  the  results  to  account  for  tank  effects  by 
adjusting  sample  sizes  downward  to  "effective  sample  sizes." 

If  there  is  no  tank  to  tank  variation  then  it  suffices  to  consider 
just  the  number  of  fish  run  per  control  or  treatment  group.  Suppose  that 
there  is  a  control  group  (group  0)  and  I  treatment  groups  (groups  1,  2, 

...,  I).  In  standard  practice  1*5.  Suppose  that  we  run  NQ  fish  in  the 
control  group  and  N  fish  in  each  treatment  group.  Then  test  then  in¬ 
volves  a  total  of  N0  +  IN  =  C  fish.  If  we  carry  out  pairwise  comparisons 
of  treatment  and  control  groups  based  on  the  arcsine  transformation  of 
observed  response  rates,  the  variances  of  2  arc  sin  »/p^  -  2  arc  sin  */p^ 
are  1/NQ  +  1/N.  We  wish  to  allocate  fish  to  treatment  and  control  groups 
so  as  to  minimize  1/N0  +  1/N  subject  to  N0  +  IN  =  C,  fixed.  This  is  a 
Lagrange  multiplier  problem  whose  solution  is  N0  =  NI1'2.  Thus  the  more 
treatment  groups,  the  greater  is  the  sample  size  in  the  control  group 
relative  to  that  in  the  treatment  groups.  This  is  because  the  control  group 
enters  into  all  pairwise  comparisons  whereas  each  treatment  group  enters 
into  just  one.  The  suggested  sample  sizes  are  then 

N  =  C/(I  +  I1/2) 

N  =  CI1/2(I  +  I1^2) 
o 

This  implies  that  for  every  100  fish  tested,  the  allocation  between  treat¬ 
ment  and  control  groups  would  be 


I 

Control  (per  100  fish) 

Treatment (per  ! 

1 

50 

50 

2 

41.4 

29.3 

3 

36.60 

21.13 

4 

33.33 

16.67 

5 

30.90 

13.82 

6 

28.99 

11.84 

7 

27.43 

10.36 

8 

26.12 

9.23 

9 

25 

8.33 

10 

24.03 

7.60 

11 

23.16 

6.98 

12 

22.40 

6.47 

We  see  that  the  allocation  is  far  from  equal  if  I  is  moderate.  For  example 
if  I  *  5,  the  control  group  gets  2.5  times  as  many  fish  as  any  of  the  treat¬ 
ment  groups. 


How  effective  in  increasing  sensitivity  of  inferences  is  this  depar¬ 
ture  from  equal  allocation?  To  determine  the  sensitivity  of  various  sizes 


i 

» 

I 


of  tests  to  detect  Increases  in  response  rates  between  control  group  and 
treatment  groups,  we  carried  out  a  series  of  power  calculations.  The 
null  and  alternative  hypotheses  considered  were; 

H  ;  p  -  p 
o  ro 

Hl:  P  >  PQ 

We  estimate  pG,  p  by  the  sample  response  rates  £0,  £  and  reject  H0  at 
a  -  0.05  if 

2  arc  sin  -  2  arc  sin  >  1.645(l/No  +  1/N)1/2 

Simultaneity  considerations  are  ignored  in  this  calculation.  The  power 
of  this  test  is  calculated  for  various  levels  of  N,  p,  p0.  The  expression 
for  the  power  is 


r  (<|>i  ■  v  ■ 

1  -  $  1.645  -  9 

\1/N0  +  1/N. 


where  $(’)  is  the  standard  normal  c.d.f. 

<|5i  *  2  arc  sin  i/p^ 

4»0  =  2  arc  sin  ^ 

Calculations  were  made  for  the  cases  of  equal  allocation  (i.e.  N 
fish  per  group)  and  "optimal"  allocation  (i.e.  N(I  +  1)/(I  +  I 1/2)  fish 
in  each  treatment  group  and  N(I  +  l)ll/2/(I  +  ll/2)  fish  in  the  control 
group.  The  usual  situation,  I  =  5,  is  considered.  The  results  are 
shown  in  Table  XVIII. 1. 
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The  following  conclusions  can  be  drawn  from  Table  XVIII. 1. 


1.  The  ability  to  discriminate  between  treatment  group  and  control  group 
mortality  rates  varies  considerably  with  N,  p0,  p.  Thus  these  calcu¬ 
lations  provide  some  idea  of  the  discrimination  capability  of  the 

test  as  a  function  of  size.  Remember  of  course  that  these  calculations 
don't  account  for  the  effects  of  heterogeneity  among  tanks. 

2.  The  effect  of  allocating  more  fish  to  the  control  group  than  to  the 
treatment  groups  is  minor.  Equal  allocation  yields  nearly  as  good 
power  as  "optimal"  allocation  and  is  logistically  much  simpler. 

3.  If  no  assumptions  can  be  made  about  the  magnitudes  of  survival  rates 
to  be  expected  at  the  various  concentration  groups  then  equal  alloca¬ 
tions  should  be  used. 

4.  If  we  can  say  something  a  priori  about  the  survival  rates  at  the 
various  treatment  groups ,  then  we  should  have  larger  sample  sizes 
in  the  lower  concentration  groups  and  smaller  sample  sizes  in  the 
higher  concentration  groups.  The  aim  should  be  to  even  out  the 
power  of  treatment  group-control  group  comparisons  as  much  as 
possible  across  groups. 

We  now  consider  adjustments  in  power  calculations  to  take  account 
of  tank  to  tank  heterogeneity.  We  do  this  by  adjusting  downward  the 
effective  sample  size  in  each  group  and  then  entering  Table  XVIII. 1  with 
the  effective  sample  size.  We  calculate  the  adjustment  by  use  of  the 
beta  binomial  model  [21], 

Suppose  that  the  test  consists  of  I  groups,  J  tanks  per  group,  and 
n  organisms  (fish  or  embryos)  per  tank.  Thus  the  actual  sample  size  per 
group  is  N  =  Jn.  Let  X^j  denote  the  number  of  responses  (dead,  abnormal, 
etc)  in  the  j-th  tank  of  group  i.  We  assume  that  X^.  is  binomially  dis¬ 
tributed  with  parameters  (n,  Pij)  and  pjj  is  in  tum^beta  distributed 
with  parameters  (a^,  g^) .  This  model  allows  for  random  variation  of  the 
p^j's  within  the  i-th  group.  Let 

y±  =  0^/(0^  +  3±) 

0t  =  1/ (a±  +  gi) 


Then  E(p±J)  =  y±  Var^)  =  y±(l  -  y^O^U  +  0±) 


The  unconditional  distribution  of  X 
variance 


ij 


is  beta  binomial  with  mean  and 


E(X  )  =  ny± 

1  +  n0 

Var  ^X±j^  =  i  +  e 


0  <  0i  c  “ 


Assume  that  0^  =  8  is  constant  across  groups.  Let 


v-  =  1  +  n6 
K  “  1  +  0 

This  is  the  variance  inflation  factor  due  to  tank  to  tank  heterogeneity. 
The  effective  sample  size  per  tank  is  then  n/K  and  so  the  effective  sample 
size  per  group  is 


N  ,,  =  Jn/K  =  N  4"v— a 
eff  1  +  n0 


As  0  ->•  0,  Neff  -*•  N,  the  number  of  organisms.  As  0  -*■  00 ,  Neff  J,  the 
number  of  tanks.  As  o  00 ,  Neff  ■+■  J(1  +  0)/0.  Thus  the  effective  number 
of  organisms  per  tank  asymptotes  out  as  the  actual  number  increases.  Fig¬ 
ure  XVIII. 1  shows  a  plot  of  neff  =  n(l  +  0) / (1  +  n0)  vs  n  for  various 
values  of  0.  We  see  the  diminishing  returns  of  placing  more  and  more  fish 
per  tank  in  the  presence  of  tank  to  tank  heterogeneity.  However  under  the 
cost  structure  assumed  in  this  section  we  still  place  the  maximum  number 
of  organisms  within  each  tank,  which  we  assume  is  50  for  embryos  and  25 
for  fry.  These  numbers  of  course  are  only  working  assumptions. 

To  get  some  feeling  for  the  meaning  of  0  in  terms  of  variance  in¬ 
flation  factors,  we  calculate  the  factors  corresponding  to  n  =  25,  n  =  50 
for  various  values  of  0. 


0.01 


0.025  0.05 


0.10 


0.50  1.00 


Var. inf 1. fact. ,n=25 


1.24 


1.61 


2.19 


3.18 


13 


Var. inf 1. fact. ,n=50 


1.5 


2.2 


3.3 


5.5 


17.3  25.5 


We  calculated  variance  inflation  factors  for  several  sets  of  Fathead 
Minnow  data  in  Section  IX.  These  were 

1.  Holcombe  and  Phipps  compound  D  fry  mortality 

n  =  25,  K  =  1.337  =  (1  +  250)/(l  +0) 

Thus  0  =  0.014  and  =  18.78.  Thus  =  75. 
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2.  Jarvinen  compound  B  embryo  mortality 

n  =  50,  K  =  3.071  =  (1  +  506)/(l  +  6) 


Thus  9  =  0.044  and  ne^  =  16.31.  Thus  =  65. 
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Z.  Expected  Precision  for  Comparison  of  Treatment  Group  and  Control 


Group  Survival  Probabilities 


In  the  previous  subsection  we  calculated  the  power  to  be  expected 
for  pairwise  treatment  group-control  group  comparisons  of  survival  proba¬ 
bilities  as  a  function  of  pQ,  p,  N.  In  this  subsection  we  calculate  the 
expected  half  lengths  of  two  sided  95  percent  confidence  intervals  on 
p  -  pQ  for  these  same  combinations  of  pQ,  p,  N.  We  again  assume  no  tank 
to  tank  heterogeneity  within  groups  and  account  for  such  heterogeneity 
by  reducing  the  effective  sample  sizes,  as  discussed  in  the  previous  sub¬ 
section.  We  base  the  precision  calculations  on  asymptotic  normal  theory. 

Namely  the  confidence  interval  half  length  is  calculated  as 
1. 96 [ Cp  q  +  pq)/N]l/2#  Asymptotic  normality  may  not  be  a  very  good  assum¬ 
ption  when  N  =  50  or  when  pQ  =  0.001,  but  it  is  only  being  used  for  planning 
purposes. 


TABLE  XVIII. 2  EXPECTED  HALF  LENGTHS  OF  95  PERCENT  TWO  SIDED  CONFIDENCE 
INTERVALS  FOR  COMPARISONS  BETWEEN  TREATMENT  GROUP  AND 
CONTROL  RESPONSE  RATES 


p  =  0.001 

*o 


D.  Power  Calculations  for  Quantitative  Weight  Response 

In  previous  subsections  we  calculated  expected  power  and  expected 
estimation  precision  for  the  quantal  survival  response.  In  this 
subsection  we  carry  out  similar  calculations  for  the  quantitative  weight 
response.  Distributional  assumptions  are  based  on  the  results  of  analyz¬ 
ing  the  Holcombe  and  Phipps  compound  D  fry  weights  in  Section  XVII.  There 
was  considerable  group  to  group  variation  in  survival  proportions  but  not 
as  much  variation  in  the  weights  of  the  surviving  fry.  In  particular 


Control  Group 

Group  4 

Group  5 

Group  6 

Survivors 

94/100 

87/100 

21/100 

0/100 

Avg.Wt.of 
Survivors (mg) 

131.2 

113.1 

108 

Max.  Wt. 

190 

186 

195 

Min.  Wt. 

45 

29 

34 

Std.  Dev. 

25.8 

34.0 

36.7 

Variability  is  not  too  dependent  on  concentration  group  or  on  survival 
rate.  The  variance  components  are  assumed  to  be  constant  across  treat¬ 
ment  groups. 

The  power  and  precision  calculations  below  are  based  on  a  number  of 
assumptions. 

1.  There  is  no  tank  to  tank  variation  within  treatment  groups.  We  discuss 
corrections  for  such  factors  later  in  the  subsection. 

2.  Equal  sample  sizes  among  treatment  and  control  groups. 

This  assumption  is  reasonable  if  we  confine  comparisons  of  weight 
gains  to  treatment  groups  with  mortality  rates  not  greatly  in  excess 
of  the  control  rate.  Otherwise  an  average  or  minimum  N  might  be  used. 

3.  Constant  variability  across  treatment  groups. 

This  assumption  might  hold  for  the  weights  themselves  or  for  some 
function  of  the  weights  such  as  log  weights. 

4.  There  are  enough  observations  to  have  effectively  an  infinite  number 
of  degrees  of  freedom.  The  power  obtained  with  12  d.f.  is  nearly  that 
obtained  with  infinite  d.f. 
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5.  No  simultaneity  correction  is  applied. 

Table  XVIII. 3  shows  the  power  of  a  one  sided  normal  theory  test  of 

H  :  y  =  y 

o  o 

vs 

h.  :  y  <  y 

1  o 


where  y,  y0  are  the  average  weights  in  the  treatment  and  control  groups 
respectively.  In  the  absence  of  tank  effects  the  individual  weights  are 
assumed  to  have  standard  deviation  a.  The  bottom  portion  of  the  table 
contains  factors  C,  for  constructing  95  percent  lower  confidence  bounds 
on  yD  -  y.  Namely  y  -  y  >  X-  -  X  -  Co. 


The  power  calculations  and  precision  factors  in  Table  XVIII. 3  need 
to  be  adjusted  for  tank  to  tank  variation  within  groups.  Suppose  that 
there  are  J  tanks  per  group,  n  organisms  per  tank,  and  that  a2  ,  a|  repre¬ 
sent  the  between  and  within  tank  components  of  variation.  The  variance 
of  the  average  weight  in  the  group  is  then 


2  2 
o  a 

—  +  — 
J  N 


2  2 

a  +  n  a 
e  t 

N 


Table  XVIII. 3  is  entered  at  N  and  at  d  = 


(U  -  y)/(a2  +  naV/2 

o  e  T 


The  precision  factors  at  the  bottom  of  the  table  remain  the  same  but  the 
confidence  bound  is  yQ  -  y  >  XQ  -  X  -  C(a2  +  na2)!/2 

We  now  apply  these  relations  to  the  weight  gain  data  from  the  Holcombe 
and  Phipps  test  on  compound  D.  In  that  example  there  is  no  statistical 
evidence  of  tank  to  tank  variation  within  groups.  Namely 


a  =  900.354 
e 

62  +  19.191a2  =  754.97 
e  T 


with  366  d.f. 
with  15  d.f. 


Thus 


*7 

-  -7.58 

2 

and  we  assume  it  is  0.  Since  a_  Is  estimated  with  366  degrees  of  freedom, 

C  A 

we  assume  it  is  known  exactly.  Thus  ae  =  30.0.  The  average  N  in  groups 
1-4  is  91.24  while  the  sample  size  in  group  5  is  21.  Assume  for  the  purpose 
of  power  calculations  that  y0  =  XQ,  y  =  X.  Thus  yD  =  131.245,  y^  =  127.924, 
y4  =  113.080,  y5  =  108.095. 

Therefore, 


,  _  Vo  ~  y3  131.245  -  127.924  „ 

“  —  A A  A  U# 111 


=  0.606 


0 

30.0 

_  % 

-  y4 

131.245  -  113.080 

a 

30 

_  yo 

~  U5 

131.245  -  108.095 

Interpolating  (approximately)  in  Table  XVIII. 3  with  N  =  90  and  d3 
d4»  d5  yields 

Control  vs  group  3  Power  =  0.16 

Control  vs  group  4  Power  =  0.99 

For  group  5  the  assumption  of  equal  N  is  not  reasonable  and  so  we  cal¬ 
culate  the  noncentrality  parameter  for  the  test  as  d5/(l/NQ  +  l/^)!/^ 
3.186.  Thus  power  =  $ (noncentrality  -  1.645)  =  $(3,186  -  1.645)  = 
$(1,540)  =  0.94. 


.  Unequal  Allocations  of  Testing  Effort  Among  Treatment  Groups 


Standard  test  guidelines  call  for  equal  numbers  of  tanks  to  be  run 
at  each  treatment  group.  Such  a  design  would  be  sensible  only  if  prior 
to  running  the  test  there  was  total  ignorance  about  response  levels  to  be 
expected.  That  is  suppose  it  was  thought  a  priori  that  the  mortality  rate 
for  each  treatment  group  could  be  anywhere  between  0  and  100  percent.  Then 
it  would  make  good  sense  to  allocate  experimental  effort  equally  among  treat¬ 
ment  groups  to  assure  specified  power  whenever  and  wherever  the  mortality 
rate  exceeds  that  in  the  control  group  by  a  specified  amount.  However  if 
on  the  basis  of  either  a  priori  scientific  information  or  previous  testing 
some  information  was  available  concerning  mortality  rates  to  be  expected 
at  the  various  treatment  groups,  then  unequal  allocation  of  experimental 
effort  would  be  preferable.  In  particular  at  the  higher  treatment  groups, 
where  mortality  would  be  expected  to  be  substantially  higher  than  the  control 
rate,  it  is  easy  to  detect  differences  from  the  control.  Thus  the  experi¬ 
mental  effort  should  be  decreased  at  these  groups.  At  the  lower  experimen¬ 
tal  groups,  where  it  is  more  difficult  to  detect  differences  from  the  control 
group,  the  experimental  effort  should  be  increased  to  improve  sensitivity. 
Thus  the  degree  of  experimental  effort  should  in  general  decrease  as  the 
toxicant  level  increases. 

Details  of  a  procedure  for  arriving  at  an  unequal  allocation  will  be 
discussed  in  the  report  on  phase  2,  for  Daphnia  magna.  For  the  purpose 
of  this  subsection,  consider  the  following  illustration  of  the  effects 
on  sensitivity  of  unequal  replication.  Suppose  that  the  experiment  is  to 
consist  of  a  control  group  and  1=5  treatment  groups.  Suppose  that  cost 
and  logistical  restraints  limit  the  number  of  tanks  to  24,  that  n  =  25 
fry  will  be  exposed  in  each  tank  and  that  tank  to  tank  heterogeneity 
is  such  that  the  variance  inflation  factor  is  1.5.  Suppose  it  is  felt 
that  the  control  group  response  rate  will  be  about  0.05  and  the  mortality 
rates  in  the  treatment  groups  will  be  about  0.10,  0.15,  0.20,  0.40,  and 
0.80  respectively.  The  classical  allocation  would  be  to  run  J  =  4  tanks 
per  group.  Thus  N  =  100  fish  per  group.  The  effective  sample  sizes  would 
be  Neff  =  100/1.5  =  66.67  fish  per  group.  Suppose  further  that  it  is  con¬ 
sidered  important  (biologically  and/or  legally)  to  detect  increases  in 
mortality  of  10  percent  above  the  control  rate.  That  is,  we  wish  to  detect 
differences  between  5  percent  and  15  percent  mortality. 

Under  the  classical  allocation  scheme  the  power  to  be  expected  for 
each  treatment  group-control  group  comparison  would  be: 

Group  2  vs  control 

1  -  3  -  $[(2  arc  sin /Tl0  -  2  arc  sin/j05)/(2/66.67)1/2  -  1.645]  = 

$[(.64  -  .45)/(2/66.67)1/2  -  1.645]  =  $(-0.53)  =  0.30 


Group  3  vs  control 


1  -  8  -  $[(.80  -  .45)/(2/66.67)1/2  -  1.645]  *  $(0.38)  =  0.65 
Group  4  vs  control 

1  -  8  *  $[(.93  -  .45)/(2/66.67)1/2  -  1.645]  -  $(1.13)  =■  0.87 
Group  5  vs  control 

1  -  8  -  $[(1.37  -  .45)/(2/66.67)1/2  -  1.645]  =  $(3.67)  =  1.000 
Group  6  vs  control 

1  -  6  *  $[(2.21  -  .45)/(2/66.67)1/2  -  1.645]  =  $(8.52)  =  1.000 

Consider  the  modified  e  ocation  of  7,  6,  6,  3,  1,  1  tanks  in  the 
control  group  and  in  each  of  the  treatment  groups  respectively.  Then 
Nq  -  175,  N2  =  150,  N3  «  150,  N4  ■  75,  N5  =  N&  =  25.  The  effective  sample 
sizes,  Neff  -  N/1.5,  are  then  116.67,  100,  100,  50,  16.67,  16.67.  The 
power  to  be  expected  for  each  treatment  group-control  group  comparison 
would  then  be: 

Group  2  vs  control 

1  -  8  -  $[(2  arc  sinvOO  -  2  arc  sinvC05)/ (1/116. 67  +  1/100)1/2  -  1.645]  = 
$[(.64  -  .45)/0.136  -  1.645]  -  $(-0.25)  =  0.40 
Group  3  vs  control 

1  -  8  -  $[(.80  -  .45) / (1/116.67  +  1/100)1/2  -  1.645]  =  $(0,923)  =  0.82 
Group  4  vs  control 

1  -  8  -  $[(.93  -  .45)/ (1/116.67  +  1/50)1/2  -  1.645]  =  $(1,195)  =  0.88 
Group  5  vs  control 

1  -  8  =  $[(1.37  -  0.45)/ (1/H6. 67  +  1/16.67)1/2  -  1.645]  =  $(1.87)  =  0.97 
Group  6  vs  control 

1  -  B  =  $[(2.21  -  0.45) / (1/116.67  +  1/16.67)1/2  -  1.645]  =  $(5.08)  *  1.00 

Comparison  of  the  two  sets  of  calculations  shows  that  we  have  improved 
the  power  of  the  comparisons  at  the  low  concentration  end  of  the  test  without 
sacrificing  any  appreciable  power  at  the  high  concentration  end  of  the  test. 
This  has  been  done  without  increasing  the  size  of  the  test.  If  we  were 
willing  to  add  several  additional  tanks  we  could  do  even  better.  The  power 
for  the  comparison  of  the  mortality  rate  0.15  vs  the  control  rate  has  been 
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increased  from  0.65  to  0.82.  This  is  a  substantial  improvement  because 
the  chance  of  not  detecting  such  an  increase  in  mortality  has  diminished 
from  1  in  2.86  to  1  in  5.56.  This  just  about  halves  the  type  2  error 
probability. 

The  previously  suggested  unequal  allocation  of  experimental  effort 
to  concentration  groups  is  intuitively  sensible,  improves  sensitivity  at 
the  low  end  of  the  experiment  where  it  is  most  needed,  does  not  diminish 
sensitivity  at  the  high  end  of  the  experiment,  and  does  not  increase  the 
overall  size  of  the  test.  It  requires  specifying  prior  beliefs  about 
the  response  levels  to  be  expected  at  the  various  concentration  levels. 

A  scheme  for  doing  this  will  be  discussed  in  the  phase  2  report.  However 
it  is  pretty  clear  that  the  more  definitive  the  prior  information,  the 
more  unequal  should  the  allocation  be.  As  prior  information  diminishes 
to  complete  ignorance  the  design  should  approach  equal  allocation.  Of 
course  the  efficacy  of  the  design  depends  on  the  accuracy  of  the  prior 
information.  If  it  is  believed  a  priori  that  group  6  will  have  between 
a  75  percent  and  100  percent  mortality  rate  but  it  in  fact  has  a  15  per¬ 
cent  mortality  rate,  then  an  unequal  tank  allocation  scheme  will  probably 
do  worse  than  an  equal  tank  allocation  scheme. 


§-0*0}^  3^6 7 


'ective  sample  size  per  group  vs  sample  size  for 
•ious  values  of  8 


XIX  DESIGN  AND  ANALYSIS  CONSIDERATIONS  FOR  FULL  LIFE  CYCLE  TESTS 
DISTINCT  FROM  THOSE  FOR  EARLY  LIFE  STAGE  TESTS 


f 


In  recent  years,  efforts  in  aquatic  toxicity  testing  have  been  shifting 
more  and  more  to  early  life  stage  tests  and  away  from  full  life  cycle  tests. 
The  results  obtained  from  a  full  life  cycle  test  are  directly  analagous  to 
those  obtained  from  early  life  stage  tests,  only  much  more  of  them  are 
accumulated.  Mortality  rates  are  recorded  periodically,  length  and  weight 
measurements  are  obtained  periodically,  and  fecundity  responses  such  as 
embryos  per  spawn,  total  numbers  of  embryos,  spawns  per  female  are 
recorded.  With  respect  to  the  survival,  weight,  and  length  responses 
the  design  and  analysis  considerations  discussed  in  the  previous  sections 
are  directly  applicable  and  require  no  amplifications  or  modifications. 

Some  differences  in  design  and  analysis  considerations  may  be  called 
for  with  respect  to  the  fecundity  responses.  Questions  of  homogeneity  of 
variances,  tank  to  tank  heterogeneity  within  groups,  form  of  dose  response 
relation  are  handled  in  essentially  the  same  manner  as  the  analagous 
questions  for  the  mortality  and  weight  responses.  Similarly,  sample  size 
determination  and  tank  allocation  design  calculation  need  to  be  made  in  the 
same  manner  as  those  carried  out  for  mortality  and  for  weight  responses. 

The  statistical  issues  are  the  same,  but  the  numbers  may  turn  out  to  be 
different. 

One  design  consideration  associated  with  fecundity  responses  may  well 
introduce  important  differences  as  compared  with  those  for  mortality  and 
weight  considerations.  Namely  it  has  been  assumed  that  the  cost  structure 
is  such  that  there  is  a  certain  incremental  cost  associated  with  adding  an 
additional  tank  to  the  test  but  once  the  tank  is  added,  the  fish  are  free. 
This  leads  to  the  recommendation  to  run  as  many  tanks  as  can  be  afforded 
and  fill  each  tank  with  the  maximum  number  of  embryos  or  fry  that  is  bio¬ 
logically  sensible.  This  cost  structure  may  not  hold  for  fecundity  respon¬ 
ses.  There  is  a  considerable  amount  of  operational  and  clerical  effort 
associated  with  accumulating  the  hatched  embryos,  counting  them,  associating 
them  with  the  appropriate  fish  or  groups  of  fish,  and  properly  recording  the 
data.  The  numbers  of  embryos  produced  are  related  to  the  numbers  of  fish 
rather  than  to  the  numbers  of  tanks.  Thus  fish  can  no  longer  be  considered 
free.  Effects  of  competition  on  production  must  also  be  considered.  In 
the  presence  of  tank  effects  it  may  thus  be  sensible  to  increase  the 
number  of  tanks  in  the  test  and  decrease  the  number  of  fish  per  tank. 

This  may  improve  the  precision  of  statistical  inferences  without  incurring 
additional  expense.  The  particular  trade  off  between  number  of  tanks  and 
number  of  fish  per  tank  would  of  course  depend  on  the  extent  of  tank  to 
tank  heterogeneity  and  on  the  tank  and  fish  costs. 

All  of  these  issues  arise  in  the  design  and  analysis  of  toxicity  tests 
on  Daphnia  magna.  Survival,  length,  and  fecundity  responses  are  reported 
at  periodic  intervals.  The  fecundity  responses  reported  by  various  investi¬ 
gators  include  average  embryos  per  surviving  female  per  chamber,  embryos 
per  surviving  female,  total  number  of  embryos  depending  on  the  design 
of  the  test.  Questions  of  multiple  daphnids  or  individual  daphnids 


per  beaker  are  commonly  posed.  Thus  most  of  the  design  and  analysis 
considerations  Involved  in  full  life  cycle  tests  that  are  distinct  from 
those  in  the  early  life  stage  tests  will  be  addressed  in  the  course  of 
the  discussion  of  the  design  and  analysis  of  the  Daphnia  toxicity  tests. 
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APPENDIX  All*  EARLY  LIFE  STAGE  DATA  SETS  USED  AS  EXAMPLES  IN  THE  BODY  OF 
THE  REPORT 


This  appendix  contains  listings  of  data  sets  from  four  early  life 
stage  toxicity  tests.  These  data  are  used  for  illustrations  of  the 
procedures  discussed  in  the  body  of  the  report.  These  data  sets  are 

Benoit  -  compound  A 

DeFoe  -  compound  C 

Holcombe  and  phipps  -  compound  D 

Jarvinen  -  compound  B. 

The  three  types  of  data  —  survival,  weight  and  toxicant 
concentration  —  are  represented  in  three  "card  types."  The  first  six 
entries  on  each  card  are  the  same  across  card  types  —  treatment  group 
(col  2) ,  replicate  designation  (col  4) ,  card  type  (col  6) ,  card  member 
(cols  7=8) ,  investigator  code  (cols  9-10) ,  test  code  (cols  11-12) .  This 
provides  enough  information  to  sort  the  cards  by  investigator,  experiment, 
type,  group,  and  sequence  should  the  data  become  disarranged.  Card  type  1 
(survival  data)  contains  in  addition  number  of  embryos  tested  (cols  16-20) , 
number  hatched  live  (cols  21-25) ,  number  of  fry  tested  (cols  31-35) ,  number 
live  at  end  of  test  (cols  36-40),  number  normal  at  end  of  test  (cols  41-45). 
Card  type  2  (weight  data)  contains  number  of  weights  recorded  from  that 
particular  chamber  (cols  14-15),  individual  weights  (5  cols  per  weight, 
up  to  13  weights  per  card) .  Card  type  3  (toxicant  concentration)  contains 
month  (cols  16-17),  day  (cols  18-19),  year  (cols  20-21),  toxicant  concen¬ 
tration  (cols  32-38)  —  one  determination  per  card.  At  the  head  of  each 
type  of  information  several  lines  of  descriptive  text  are  given. 


Appendix  All  is  the  appendix  for  Section  II. 
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10401 

030679 

1.10 

5 

b 

3 

20401 

031379 

C.  59 

5 

b 

3 

30*01 

032079 

1.10 

5 

b 

3 

40*01 

032779 

0.  5 1 

6 

A 

3 

10401 

030879 

1.10 

6 

A 

3 

20401 

031579 

0.96 

-  * 

A 

3 

30401 

032279 

1.90 

6 

A 

3 

40401 

032979 

1.10 

o 

b 

3 

10*01 

030679 

1.40 

6 

b 

3 

20401 

031379 

0.6<5 

6 

b 

3 

30401 

032079 

1.60 

to 

b 

3 

40401 

032779 

1.10 

296 


INVESTIGATOR:  Duane  BENOIT  toil*  TEST:  a  (01) 

OATa  FROM  EARLY  LIFE  STAGE  TESTS  WITH  FATHEAD  MINnOwS 
SIX  LEVELS:  1 (CONTROL) <2 (LOmEST) ( HIGHEST ) i  A  KEFS  EA  (A.B.C.OI 
S  EMBRYOS  TESTED.  S  ALlvE  AFTER  HATCH,  S  NORMAL  FRY  AFTER  HATCH 
S  FRY  TESTED.  0  AtlvE  AT  ENC,  »  NORMAL  AT  ENO 


1  A 

i 

10101 

30 

2  A 

23 

15 

15 

15 

1  -b 

i 

40101 

30 

2b 

2b 

15 

15 

15 

1  C 

l 

10101 

30 

20 

20 

15 

15 

15 

1  0 

l 

10101 

30 

2  A 

23 

15 

15 

15 

2  A 

i 

10101 

30 

22 

21 

15 

15 

15 

2  0 

i 

10101 

30 

21 

21 

15 

1  A 

1A 

2  C 

i 

10101 

30 

21 

21 

15 

15 

15 

2  0 

i 

10101 

30 

lb 

lb 

15 

15 

15 

i  A 

l 

10101 

30 

17 

lb 

15 

1A 

IA 

3  a 

i 

10101 

30 

20 

20 

15 

1A 

1A 

3  0 

l 

10404 

30 

22 

21 

15 

1  5 

15 

3  0 

l 

10101 

30 

22 

21 

15 

15 

15 

A  A 

l 

10101 

30 

12 

11 

15 

12 

12 

A  B 

l 

10101 

30 

48 

18 

-1 5 

11 

11 

A  C 

l 

10101 

30 

22 

22 

15 

1A 

1A 

A  0 

l 

10101 

30 

25 

2a 

15 

1A 

1A 

3  A 

l 

10101 

30 

20 

20 

15 

05 

09 

5  b 

l 

10101 

30 

17 

17 

15 

Ot 

06 

5  C 

l 

10101 

30 

17 

lb 

15 

OS 

09 

5  D 

i 

10101 

30 

2  A 

2  A 

15 

06 

08 

b  A 

l 

10101 

30 

19 

18 

15 

08 

08 

b  D 

l 

10101 

30 

20 

19 

15 

03 

03 

b  c 

i 

10101 

30 

2b 

25 

15 

12 

12 

b  c 

l 

10101 

30 

21 

21 

15 

1C 

10 

Investigator: 

OUANE  BENOIT 

(01), 

TEST 

:  A 

(01 ) 

OATA  FROM  EARt Y---LI FE  STAGE  TE ST4— FAThEaO  MINNOmS 
SIX  LEVELS:  KCONTROL)  .EIlOwESTI  .....6IFIGHEST);  A  REPS  Ea  (A.B.C.O) 
INDIVIDUAL  WEIGHTS  ( MG  I  OF  ALL  FISH  ALIVE  AT  END  OF  TEST  11-2  CaRDS/TANiO 
NOMbEK  OF  ME  lOHTS  4  LIST  OF  METwMTS - - 


1 

A 

2 

10101 

1A 

16A 

152 

16A 

123 

13C 

09  A 

1 A 1 

150 

2C5 

128 

08b 

070 

lit 

1 

A 

2 

20101 

1A 

139 

1 

b 

2 

lOlOl 

45 

090 

42  5 

180 

129 

102 

165 

103 

135 

170 

lao 

1  AO 

092 

162 

1 

b 

2 

20101 

15 

139 

106 

1 

c 

2 

10101 

15 

175 

130 

102 

143 

121 

131 

172 

120 

150 

090 

133 

125 

121 

1 

C 

2 

-20404 

15  152 

— 14-8  - 

- - 

_ - - 

....  — 

- 

1 

U 

2 

10101 

15 

136 

oao 

123 

116 

133 

100 

100 

090 

183 

190 

073 

O0A 

090 

1 

0 

2 

20101 

15 

152 

160 

2 

-A 

-2  -10401  -15 

-  090 

— 2AQ- 

425- 

-082 

-08  A 

420  - 

190 

11A 

420 

108 

107 

4  58 

126 

2 

A 

2 

20101 

15 

110 

060 

2 

0 

2 

10101 

1A 

123 

1A5 

112 

180 

083 

12A 

100 

155 

111 

107 

090 

161 

1  A2 

2 

a 

2 

20104- 

16 

090 

.  . 

- - 

.  . 

2 

c 

2 

10101 

15 

083 

13A 

1  IA 

118 

18C 

160 

130 

1  BO 

132 

190 

08A 

120 

132 

2 

c 

2 

20101 

15 

131 

109 

2 

D 

2 

lOlOl 

15 

180 

062 

452 

480 

173 

438 

130 

092 

133 

130 

G8o 

1  aO 

1  15 

2 

D 

2 

20101 

15 

093 

137 

3 

A 

2 

10101 

1A 

110 

159 

09a 

1  a8 

08C 

151 

121 

096 

1AA 

132 

100 

112 

096 

3 

-A 

2 

20101 

14 

460 

- - 

-  -  — 

— 

-  . . 

— - 

3 

a 

2 

10101 

1A 

113 

162 

070 

141 

1  3  A 

090 

12A 

2A8 

182 

138 

09a 

098 

131 

5 

a 

2 

20101 

1A 

1A6 

3 

c 

2 

10101 

45 

140 

07  3 

240- 

-  119 

- 1 30 

110 

090 

06A 

080 

193 

123 

066 

15C 

3 

c 

2 

20101 

15 

162 

118 

3 

0 

2 

10101 

15 

110 

150 

200 

078 

093 

1 A7 

115 

122 

101 

126 

092 

153 

1  03 

3 

D 

2 

20101 

15 

120 

070 

A 

A 

2 

10101 

12 

131 

130 

112 

132 

092 

132 

112 

099 

106 

1  AO 

125 

OVA 

A 

a 

2 

10101 

11 

083 

136 

1 A8 

060 

46  2 

139 

157 

121 

091 

1 A6 

1  3  A 

A 

C 

2 

10101 

15 

182 

086 

140 

106 

105 

165 

130 

1 0  A 

09  j 

115 

122 

099 

1  A5 

A 

C 

2 

20101 

15 

117 

09A 

4 

0 

c 

10101 

14 

121 

140 

2  50 

132 

132 

114 

123 

105 

llo 

154 

120 

131 

4 

0 

Z 

20101 

14 

111 

07'C 

5 

A 

z 

10101 

09 

043 

111 

100 

064 

111 

090 

050 

150 

5 

b 

z 

10101 

Ob 

054 

117 

127 

098 

12  1 

157 

5 

c 

z 

10101 

09 

070 

0b5 

184 

144 

125 

084 

112 

071 

106 

5 

0 

z 

10101 

08 

080 

130 

071 

166 

11C 

07  8 

137 

07  8 

0 

A 

z 

10101 

0* 

-024 

034 

025 

010 

021 

014 

051 

018 

6 

6 

z 

10101 

03 

037 

031 

007 

6 

C 

z 

10101 

12 

041 

059 

051 

031 

035 

049 

033 

033 

052 

030 

034 

052 

O 

0 

z 

10101 

10 

026 

020 

048 

013 

042 

082 

037 

024 

042 

on 

Il  vc  S  T  IGA  TOR  :  OuAnE  SfeNuIT  (Oil,  TEST:  «  (Oil 

DATA  FROM  EARLY  LIFE  STAGE  TESTS  *ITH  FaTmEAO  MINNOhS 

SIX  LEVELS:  l(CONTROL)  ,2(LO*EST)  ,....6(hI(»HEST)  i  4  REPS  EA  (A,8,C,0I 

see  oata  sheet  for  notes 

Mfc ASUKEl)  CONCENTRATIONS  OF  TOXICANT  (MICRO  G/Ll 


1 

A 

3 

10101 

061179 

0.06 

1 

A 

3 

20101 

062579 

0.08 

1 

A 

3 

30101 

070979 

C  .  1 5 

1 

d 

3 

10101 

060679 

0.00 

1 

b 

3 

20101 

061879 

0.08 

1 

o 

3 

30101 

070279 

0.10 

1 

c 

3 

10101 

060879  - 

0.07 

1 

c 

3 

20101 

062179 

0.06 

1 

C 

3 

30101 

070579 

C  .13 

1 

0 

3 

10101 

061479 

0.06 

1 

0 

3 

20101 

062879 

0.10 

2 

A 

3 

10101 

061179 

1  .68 

2 

A 

3 

20101 

062579 

1.59 

2 

A 

3 

30101 

070979 

2.00 

2 

b 

3 

10101 

C60679 

1.83 

2 

b--3- 

20103 

0614-7-6 - 

-4*69 

2 

a 

3 

30101 

061879 

1.27 

2 

b 

3 

40101 

070279 

1.58 

2 

C 

3 

10101 

-  060669 -  - 

1*70 

2 

c 

3 

20101 

061879 

1.40 

2 

c 

3 

30101 

062179 

1.52 

2 

L 

3 

40W1 

07057-9- -  - - 

2*15 

2 

0 

3 

10101 

061479 

1.64 

2 

0 

3 

20101 

062879 

1  .84 

3 

* 

3- 

10101 

061179 - 

3.03- 

3 

A 

3 

20101 

062579 

2.69 

3 

A 

3 

30101 

070979 

4.00 

3 

a 

3 

10101 

06067  9  — -  - - 

3.26 

3 

a 

3 

20101 

061479 

3.58 

3 

b 

3 

30101 

061879 

2.81 

3 

b 

3 

40101 

070279 

3.32 

3 

c 

3 

10101 

060879 

2  .85 

3 

c 

3 

20101 

061879 

2.36 

3 

L 

3 

30101 

062179  -  —  - 

2.72 

3 

C 

3 

40101 

070579 

3  .61 

3 

0 

3 

10101 

061479 

3.79 

3 

0 

3 

20101 

062879 

3.44- 

A 

3 

10101 

060879 

5.99 

4 

A 

3 

20101 

061179 

6.37 

4 

A 

3 

30101 

-062579 

5.57 

4 

A 

J 

40101 

070979 

7.30 

4 

b 

3 

10101 

060679 

7.06 

4 

» 

3 

20101 

060679 

6.83 

4 

b 

3 

30101 

061179 

7.19 

4 

b 

3 

40101 

061879 

5  .91 

b 

3 

50101 

062579 

6.09 

4 

b 

3 

60101 

070279 

6  .49 

CTi 


298 


4  3  3  70101. 
4  0  3  00101 
*♦  C  3  10101 
4  C  3  20101 
4  C  3  30101 
4  c  3  40101 
413  50101 
4  13  60101 
413  70101 

4  C  3  30101 
403  10101 
403  20101 
403  30101 
403  40101 
403  50101 
6  A  3  101OI-- 

5  A  3  20101 
5  A  3  30101 
503  101 01 
5  d  3  20101 
533  30101 
5-1  3  10101 

5  C  3  20101 
513  30101 
503  10101 
503  20101 
0  A  3  10101 
o  A  3  20101 
0  A  3  30101 
033  10101 
0o3  20101 

6  b  3  30101 
0  C  3  10101 
013  20101 
013  30101 
003  10101 
003  20101 


07C579 

7.92 

070979 

7.80 

060P7  9 

5.66 

061179 

6.21 

062179 

5.23 

062179 

5.14 

062579 

3.44 

070279 

5.95 

070579 

7.02 

070979 

7.50 

061179 

6. 72 

061479 

7.57 

062579 

5.93 

062879 

7.76 

070979 

8.40 

-061179  .  - 

1  3.-40- 

062579 

12.00 

070979 

18.00 

C 60 679 -  - 

17.20 

061879 

10.80 

070279 

11.80 

06067  9- - - 

_.-.10.60-— 

062179 

1C. 70 

070579 

13.10 

161479 

13.60 

062879 

15.30 

061179 

24.10 

062579 

23.30 

070979 

30.00 

060679 

29.80 

061679- - 

2-6*60 - 

070279 

32.70 

060879 

24.20 

062 1 79 

--24*20— 

070579 

33.40 

061479 

21.20 

062679 

23.30 

299 


rrT7r7?T 


M 


a 


INVE  STIGATOBi  DEFOE  ID2I.  I'Wi  C 

OATA  HON  EARLT  UFS  STAGE  TESTS  <IH  =4TMEAD  N|*.NDHS 
SI*  LEVELS  l  1IC  JNTBDLI .  211  14c  ST) .,(>(  HIGH."  Sll  I?  H  f  »  S  £4  U.H 
1  ENBRTQS  TESTED*  1  LIVE  AFTeR  HATCH,  i  N'lRNAL  f  B  V  AFTER  HATCH 
I  FRT  TESTED*  I  ALIVE  AT  END  *  ■  NOB  NAL  »T  END 


I 

A 

I 

13201 

50 

29 

29 

20 

20 

20 

1 

8 

1 

13201 

50 

)  1 

It 

20 

20 

20 

2 

A 

1 

10201 

50 

U 

17 

23 

20 

?0 

2 

8 

1 

10201 

50 

)  1 

31 

20 

20 

23 

3 

A 

I 

10231 

50 

30 

33 

20 

20 

20 

3 

8 

I 

10201 

50 

1  3 

33 

20 

19 

18 

A 

A 

1 

13201 

50 

3A 

3A 

21 

21 

21 

A 

i 

l 

10201 

50 

29 

27 

20 

19 

19 

5 

A 

1 

10201 

50 

28 

23 

20 

16 

16 

5 

8 

1 

10201 

'  50 

33 

33 

20 

15 

15 

6 

A 

1 

10201 

50 

31 

33 

20 

00 

00 

6 

B 

1 

10201 

50 

31 

00 

20 

03 

00 

INVESTIGATOR* 

DEFOE 

(021* 

TEST  » 

C 

DATA 

FRON  EA»LT 

LIFE 

STAGE 

TESTS  hITH  FATHEAD 

SlNNDVj 

SIX  1 

LEVELS  • 

llCONTRDl)*  2IL0HE  ST  )*,, 

•  »  b( HIGHE  S  T  ) ) 2 

9EPS 

:  A  (  A  •  3  ) 

INDIVIDUAL  VE  I  S' 

HIS  <  MG )  OF 

ALL 

FISH 

ALIVE 

AT  END  OF 

TEST 

(  2 

CAROS 

DR 

LESS/CEUt 

NUNBER 

OF  HEIGHTS  »  LIST  OF  HEIGHTS 

1 

A 

2 

13201 

23 

155 

139 

106 

170 

166 

179 

1  2  A 

132 

369 

395 

113 

1 83 

3)2 

1 

A 

2 

23201 

20 

1A  5 

163 

17A 

139 

153 

172 

133 

1 

B 

2 

10201 

20 

Oil 

1 A  7 

172 

173 

17A 

151 

200 

103 

1 2  A 

238 

173 

1  A5 

170 

i 

B 

2 

23201 

20 

190 

165 

163 

089 

230 

135 

071 

2 

A 

2 

10201 

20 

1 2  A 

1A  3 

312 

109 

1  A  1 

14? 

1C1 

1 2  v 

162 

129 

153 

187 

128 

2 

A 

2 

20201 

20 

191 

IV  9 

126 

103 

133 

19? 

080 

2 

B 

2 

10201 

20 

091 

192 

196 

21A 

193 

197 

09  5 

?23 

116 

1 6  A 

162 

2  00 

1  90 

2 

B 

2 

20201 

20 

161 

236 

200 

190 

19? 

137 

181 

3 

A 

2 

10201 

23 

171 

1A  8 

095 

179 

0?  6 

160 

1  ?  9 

157 

179 

15  3 

102 

145 

137 

3 

A 

2 

20201 

20 

131 

159 

1AA 

152 

11? 

177 

198 

3 

B 

2 

10231 

IB 

170 

153 

151 

165 

131 

13A 

IV9 

152 

162 

137 

120 

129 

131 

3 

B 

2 

20201 

IB 

100 

096 

129 

156 

0A7 

A 

A 

2 

10201 

21 

097 

133 

099 

122 

139 

16? 

099 

135 

165 

l  A  A 

122 

152 

073 

A 

A 

2 

20201 

21 

193 

1A  A 

IA5 

1A5 

139 

226 

150 

DA  6 

A 

B 

2 

10201 

19 

115 

111 

073 

099 

133 

0  8  A 

132 

iia’ 

364 

102 

093 

091 

131 

A 

B 

2 

20201 

19 

177 

063 

12A 

090 

109 

119 

5 

A 

2 

10201 

16 

037 

056 

072 

0A7 

056 

053 

037 

053 

044 

007 

029 

0A7 

049 

5 

A 

2 

20201 

16 

037 

03  A 

35  9 

5 

B 

2 

13201 

15 

029 

3>0 

339 

055 

035 

033 

057 

335 

366 

3  A  5 

050 

063 

033 

5 

B 

2 

20201 

15 

OA  5 

038 

6 

A 

2 

10201 

30 

6 

B 

2 

10201 

00 

INVESTIGATOR*  DEFOE  (02).  TEST*  C 

DATA  FRON  EARLV  LIFE  STAGE  TESTS  VITH  B  4 THE  AD  “USOVS 
SU  LEVELS  *  UnNTBDL),2U0V;ST)»...»S«HIGHESni2  *£»>  :4 
MEASUREO  CONCENTRATIONS  OF  TOXICANT  I  NOTE  NO  UNITS  GIVEN) 


U.3) 


A 

3 

13231 

090579 

000.356 

A 

3 

20201 

093779 

3CC.365 

A 

3 

30201 

091379 

000. 07A 

A 

3 

40201 

091279 

000. lib 

A 

3 

50201 

09 1 A  79 

OCL.  )68 

A 

3 

60201 

091979 

000.323 

A 

3 

73231 

092979 

000.324 

A 

3 

80201 

130279 

000.033 

B 

3 

10201 

090679 

000.076 

B 

3 

23201 

091179 

000. 05A 

300 


e 

3 

33201 

091373 

300.045 

e 

3 

43201 

091779 

000.318 

a 

3 

50201 

0)2073 

000.042 

3 

63201 

0)277) 

000.326 

3 

70201 

100177 

000.324 

B 

3 

80201 

100379 

ooo. on 

3 

10201 

070579 

001. 35 

3 

20201 

090779 

1.02 

2 

3 

30201 

031073 

2.54 

2 

3 

40201 

09127) 

2.24 

2 

3 

50201 

091479 

1.04 

2 

3 

60201 

09197) 

1.03 

2 

3 

70201 

032877 

1.96 

2 

1 

80201 

103273 

2.02 

2 

8 

3 

10201 

690679 

1.78 

2 

8 

3 

20201 

091179 

2.34 

2 

8 

3 

30201 

091377 

2.15 

2 

8 

3 

40201 

091779 

1.  )4 

2 

8 

3 

53201 

0)2077 

1.98 

2 

8 

3 

60201 

0)2779 

1.54 

2 

8 

3 

70231 

130179 

2.01 

2 

8 

3 

00201 

100377 

2.01 

3 

4 

3 

10201 

030579 

7.47 

3 

A 

3 

20201 

030779 

5.66 

3 

A 

3 

30201 

031073 

4.»1 

3 

A 

3 

40201 

091273 

4.28 

3 

A 

3 

50201 

091479 

6.16 

3 

A 

3 

60201 

091979 

6.55 

3 

A 

3 

70201 

0920 19 

4.73 

3 

A 

3 

80201 

100279 

6.03 

3 

8 

3 

10201 

090679 

5.22 

3 

8 

3 

20201 

091179 

7.06 

3 

a 

3 

33201 

091379 

8.40 

3 

a 

3 

40201 

091779 

4.  73 
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APPENDIX  AVII*  THEORETICAL  BASES  OF  CHI  SQUARE  AND  PROBIT  BASED  TESTS 

■  *  ■  "7 

OF  TANK  TO  TANK  VARIATION  WITHIN  TREATMENT  GROUPS 

I 


A.  Chi  Square  Test  for  Heterogeneity  of  Tanks  Within  Treatment  Groups 

Suppose  that  there  are  I  treatment  (or  control)  groups  and  J 
tanks  per  treatment  group.  Let  p^  denote  the  probability  of 
death  within  the  i-th  group,  i  =  1,  2,  I  assuming  homogeneity 

of  tanks  within  groups.  The  heterogeneity  chi  square  procedure 
tests  the  hypothesis 

H  :  p  =  p0  =  ...  =  p_  =  p. 
o  1  2  I 

Let  N-jj ,  X-jj  denote  the  number  of  fish  and  number  of  dead 
fish  respectively  in  the  j-th  tank  of  the  i-th  group.  Let 
Xi+  =  ZjXij>  *++  =  ^jxij.  Ni+  =  ZjNij,  N++  =  pi:j  = 

Xij/Nij.  Pi-  =  Xi+/Ni+.  P  =  yN++- 

The  chi  test  of  homogeneity  based  on  each  tank  separately  is 
based  on  the  statistic 


v2  =  y  y  «u  -  V’ 

Ax  Z,  Z  n  p(i  -  p) 
i=l  j=l 


with  IJ  -  1  d.f. 


The  chi  square  test  of  homogeneity  based  on  tanks  pooled  within 
treatment  groups  is  based  on  the  statistic 


X  y 


h  (x.^  -  N.  J)2 

S  i+ 

L  N1+^(l  -  p) 


The  difference  of  these  two  statistics. 


l  X.  (X..  -  N.J.  )2 


x-x 


with  I  -  1  d.f. 
2 


V  V  ij  ij  i- 

Z  Z  pTi-- 7T 

i=l  j=l 


with  I(J  -  1)  d.f. 


This  difference  is  thus  a  test  of  homogeneity  of  response  rates 
across  tanks  within  treatment  groups  and  has  a  nominal  chi  square 
distribution  with  I(J  -  1)  degrees  of  freedom  if  such  homogeneity 
exists.  Note  however  tnat  the  weights  in  the  denominator  of  this 
"chi  square"  statistic  have  been  calculated  under  the  null  hypo¬ 
thesis  that  P]_  =  P2  =  '*•  =  Pj  =  P’  Thus  if  the  response  probabili¬ 
ties  differ  among  treatment  groups,  as  we  have  seen  in  the  case  with 
respect  to  fry  mortality,  then  the  weights  are  incorrect  and  the 
nominal  null  distribution  is  inappropriate.  Thus  this  procedure 
leaves  something  to  be  desired. 


Appendix  AVII  is  the  appendix  for  Section  VII. 
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Probit  Model  Based  Test  for  Heterogeneity  of  Tanks  Within 
Treatment  Groups 

Let  Xij,  N-h,  Xi+,  Ni+,  p±j,  Pi,  have  the  same  interpretation 
as  in  subsection  A.  Let  ^  =  $(a  +  BCi>  denote  the  response  pro¬ 
bability  in  the  i  -  th  group  where  Ci  is  some  function  of  the 
toxicant  concentration  in  the  i  -  th  group,  is  the  standard 
normal  c.d.f.  and  a,  g  arejnodel  parameters  to  be  estimated  from 
the  data.  Let  =  $(a  +  |ci)  denote  the  maximum  likelihood  es¬ 
timate  of 

Finney's  suggestion  is  to  compare  test  statistics  for  lack  of 
fit  to  the  probit  model  based  on  each  tank  separately  and  based 
on  tanks  pooled  within  treatment  groups.  The  lack  of  fit  test 
based  on  each  tank  separately  is 
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with  IJ  -  2  d.f. 


The  lack  of  fit  test  based  on  tanks  pooled  within  treatment  groups 
is 
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with  I  -  2  d.f. 


The  difference  is  these  two  statistics, 
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with  I(J  -  l)d.f. 


This  difference  is  again  a  test  of  homogeneity  of  response  rates 
across  tanks  within  treatment  groups  and  has  a  nominal  chi  square 
distribution  with  I(J  -  1)  degrees  of  freedom  if  such  homogeneity 
exists.  Note  that  the  weights  in  the  denominator  of  the  statistic 
are  based  on  the  fitted  probit  model  and  depend  for  their  validity 
on  the  validity  of  this  model. 
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Figure  VIII. 3  EXAX2  output 
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Figure  VIII. 6  EXAX2  output 
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Figure  VIII. 7  EXAX2  output 


Figure  VIII. 8  EXAX2  output 
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Figure  VIII. 9  EXAX2  output 


Figure  VIII. 10  EXAX2  output 
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Figure  VIII. 11  EXAX2  output 


t  t 

t  t 

♦  -  * 

t  £  ? 
i  3  i 
%  1 1 

!  £  * 

1  * 

j « ! 

i  ir 

f  x  * 


-**:  c  o 

f-  •  »• 

o 

v-  r  '  (>i 


■  rc 

«  •  .♦ 

•i.r  o 

r.r-.i>i 


‘<L  . 


i'  -  c 
■—  >  •  (•  . 
•—  '-•c  c 


~a 

-J_l  >- 

or  o; 

OU  K 


C  ti 

<• 

•—4  O  ^ 


Q  < 

IU  ,  c.- 

2  r^f» 

M . 

007  .->r- 

r  «’:'i 

Ofs-ir 

O'X'  • 


•L  •  B  •-» 

•r i 

K  t 

'hC  .  <1. 

V)''  i< 

O'**— 

«— 

h 

t#i/> 

*•  C;  U  **S 

</?  - 

U'l 

-£-*  ■-  J 

IU 

»  ■*  •  <t  i 

F  n  o' 

1 

^  | 

■  :i  < 

*  r  <  :r— 

t-  — > 

C  .  -J'i 

2  — 

VI 

U  >• •  ■  •  3T 

cr 

ncj  i  > 

2  v»**t 

o^rs/ 

U  .r  *** 

Z~~~Tv~ 

t— 

<s*  >  \  t.  It 

UJiO 

o-o 

►-  O' 

arn/i  ii 

*>X 

Cj  ii  h  > 

«!■' 

</ 

4‘  pr 

o  « > 

— Cl 

~J  >~x 

2  n  t* 

i  •  : 

cr  >  5 1  -  > 

H  0< 

wn 

< 

>S> 

7 

nd 

c 

o 

c 

L  • 

c 

l  J 

t 

o 

4 

c 

u 

L 

C*: 

( 

f  i 

f  * 

a 

• 

u*r 

II 

II 

1 

it 

-J 

< 

< 

r— 

4 

9 

H 

c 

u. 

v_« 

i 

-J 

u 

d 

7* 

«. 

• 

, 

1 

■, 

L*> 

»• 

1 

+■ 

J 

1 

I 

O 

**2 

j 

• 

a 

*") 

\ 

a 

V 

i 

4 

! 

o 

*  «u 

«•  . 

*  I 

*  o1 


3C-C-0  •  » 
-JUOCO  i 

•••oroo  i 

i 

~j -ic  c-o  :c 

.  1  t 


CiC  DC  l>- 
K  C.O  j  ri 

It— 

<J  4  •  4  •  J 

•  O  : 

<r  j  ! 


■a  c  ci  K  r  • 

•  4  •  • 


c  CX-  C 

O  OXX  r 
„ir  vr»-r  <s\ 

;  •  •  • 

,  ^4  .->* 


t  •  »  • 

I  .  '  ~  '  '  | 


•-JCOC  '■ 

r 

*-*  •  4  •  • 


}•  ir.  4 


.  *<s- 

f  xu/i 


4  -II 

.1. 

5  /:  1  •. 

It 

j  j  it  it 

>* 

Figure  VIII. 13  EXAX2  output 
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Lgure  VIII. 16  EXAX2  output 


Figure  VIII. 17  EXAX2  output 


*******<■**♦*+***********+*♦  ****♦#$ 


Figure  VIII. 18  EXAX2  output 
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Figure  VIII. 19  FXAX2  output 
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Figure  VIII. 21  EXAX2  output 
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Figure  VIII. 22  EXAX2  output 
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Figure  VIII. 23  EXAX2  output 
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Figure  VIII. 25  EXAX2  output 
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Figure  VIII. 26  EXAX2  output 
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Figure  VIII. 30  EXAX2  output 
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Figure  VIII. 31  EXAX2  output 


Figure  VIII. 32  EXAX2  output 
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Figure  VIII. 33  EXAX2  output 
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Figure  VIII. 34  EXAX2  output 


EXAX2  output 


APPENDIX  AVIII.2*  DESCRIPTION  AND  INSTRUCTIONS  FOR  USE  OF  EXAX2 
COMPUTER  PROGRAM  ~ 


: k 

Appendix  AVIII.2  is  an  appendix  for  Section  .III. 


345 


£XAX2 — A  COMPUTER  PROGRAM  TO 
COMPARE  BINOMIAL  PROPORTIONS* 


Program  Description  and  Card  Input  Information 


Paul  I.  Feder  and  Susan  A.  Willavize 
Department  of  Statistics 
The  Ohio  State  University 


September  12,  1980 


*  This  work  was  performed  with  the  support  of  the  U.S.  Army  Medical  Bioengineering 
Research  and  Development  Laboratory,  Frederick,  Maryland  under  Contract  DAMD17-79- 
C-9150  at  The  Ohio  State  University. 
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Introduction 


Dichotomy  us  data  arise  in  many  fields  of  application.  In  fish  toxicology 
experiments  it  is  of  interest  whether  an  embryo  hatches  into  a  live  fry,  whether 
the  fry  has  survived  27  days  post  hatching,  whether  the  surviving  fry  are  normal 
or  abnormal.  In  toxicology  experiments  on  higher  animals,  such  as  mammals,  the 
presence  or  absence  of  specific  types  of  tumors  or  deformities  is  of  importance. 

Such  responses  naturally  give  rise  to  0-1  or  success-failure  type  data.  Such  0-1 
data  are  encountered  also  in  many  other  fields.  For  example  in  industrial 
applications  it  is  noted  whether  or  not  a  unit  meets  design  specifications  or 
whether  or  not  a  unit  lasts  beyond  the  warranty  period.  In  sociological  applica¬ 
tions  it  is  of  interest  to  note  whether  an  individual  exhibits  certain  behavior 
patterns,  has  specific  opinions,  etc.  A  myriad  of  additional  applications  could 
be  cited. 

Part  of  tne  analysis  of  success-failure  data  involves  estimating  the  probabi¬ 
lities  of  "success"  within  various  groups  and  canparing  these  probabilities  across 
groups.  EXAX2  is  a  computational  tool  to  assist  in  carrying  out  such  comparisons. 
Since  the  need  for  the  program  was  motivatied  by  problems  arising  in  aquatic 
toxicology,  the  remainder  of  the  section  centers  around  such  applications. 

In  toxicity  tests  on  fish  or  daphnids,  a  number  of  test  concentrations  are  run 
along  with  one  or  more  control  groups.  Within  each  concentration  group  (treatment 
or  control)  several  tanks  or  beakers  are  run,  each  chamber  containing  a  number  of 
the  organisms  under  study.  Within  each  chamber  the  numbers  of  embryos,  numbers  of 
fry  hatched  live  or  normal,  numbers  of  fry  surviving  or  normal  at  the  conclusion  of 
the  experiment  are  recorded.  It  is  desired  to  compare  the  proportions  of  live  or 
normal  embryos  or  fry  in  the  various  treatment  groups  with  corresponding  proportions 
in  the  control  groups. 
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A  preliminary  inference  of  importance  is  to  test  the  response  proportions  among 
the  tanks  or  beakers  within  each  group  for  homogeneity .  if  there  is  no  evidence  of 
tank  to  tank  heterogeneity  within  concentration  groups,  the  data  can  be  pooled  across 
tanks  within  groups  and  further  analyses  carried  out  based  on  binomial  theory. 

However  if  evidence  exists  of  tank  to  tank  heterogeneity  within  groups,  subsequent 
analyses  must  be  adjusted  to  reflect  this,  either  by  adjusting  the  model  or  the  data 
or  by  carrying  cut  analyses  on  a  per  tank  basis. 

EXAX2  carries  out  tests  of  homogeneity  of  tanks  within  groups  hased  on  the  chi 
square  statistic.  If  the  expected  response  frequencies  are  "large  enough"  the 
distribution  of  the  test  statistic  is  approximated  by  large  sample  chi  square  theory. 
If  the  expected  response  frequencies  are  not  large  enough  for  asymptotic  theory  tc 
be  applicable,  the  test  statistic  is  evaluated  based  on  its  exact  snail  sample 
distribution,  derived  from  the  exact  small  sample  distribution  of  the  contingency 
table,  conditional  on  the  margins  (March,  1972).  Individual  tests  of  homogeneity 
within  groups  are  combined  by  means  of  Fisher's  method  (Littell  and  Folks,  1971, 

1973)  to  obtain  an  overall  test  of  homogeneity. 

EXAX2  also  has  the  capability  to  test  for  heterogeneity  of  response  rates  across 
treatment  groups  based  on  responses  pooled  within  groups.  Either  the  exact  snail 
sample  or  approximate  large  sample  distribution  of  the  chi  square  statistic  is 
utilized.  Heterogeneity  among  tanks  within  groups  can  be  accounted  for  either  by 
modifying  the  chi  square  statistic  by  a  heterogeneity  factor  (Finney,  1971)  or  by 
rrodifying  the  data  to  "effective  frequencies"  to  reflect  within  tank  correlation. 

EXAX2  can  calculate  exact  confidence  intervals  on  the  odds  ratio  of  a  treatment 
^roup  response  rate  as  compared  with  a  control  group  response  rate,  based  on  the 
.  mull  distribution  of  Fisher's  exact  test  for  2x2  tables  (Thomas,  1971).  The 
: .ounce  interval  calculations  are  based  on  responses  pooled  across  tanks  within 

The  response  frequencies  would  need  to  be  modified  to  "effective  frequencies' 
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to  reflect  within  tank  correlation. 

Section  II  discusses  program  organization  and  capabilities  and  provides  a  more 
detailed  description  of  the  program's  procedures.  Section  III  contains  detailed 
instructions  for  card  input. 

II  Program  Organization  and  Capabilities 

Suppose  that  the  aquatic  toxicity  test  consists  of  N  treatment  groups  (including 
both  test  and  control  groups;  and  K  tanks  or  beakers  per  group.  (EXAX2  can  handle 
different  values  of  K  for  each  group,  however  we  assume,  a  single  K  value  for 
rotational  convenience.)  Thus  the  responses  within  each  treatment  group  can  be 
summarized  as  a  2xK  contingency  table.  The  rows  represent  the  response  category 
(e.g.  dead,  live  or  abnormal,  normal,  etc.)  Each  column  represents  the  responses 
from  an  individual  tank.  We  wish  to  compare  response  probabilities  across  columns. 

It  is  first  necessary  to  specify  the  entries  in  the  tables.  A  2xK  contingency 
table  can  be  specified  as  a  3x(K+l)  matrix  partitioned  as  follows: 


coi  .1  ! 

Co]  2 

•  •  • 

i 

Col  K  ! 

Row 

Totals 

Row  I 

~X(1,  1)  j 

X(l,  2) 

•  •  • 

X(l,  K)  j 
| 

R(l) 

Row  2 

X(2,  1)  | 

X(2,  2) 

•  •  • 

X(.2,  h)  j 

R(2) 

Col  Total 

c(i)  ! 

C(2) 

•  •  • 

coo  | 

XN 

where  the  X(I,J)  comprise  the  "body"  of  the  table;  R(l)  and  R(2)  are  row  totals; 

C(l),  C(2),  ...,  C(K)  are  the  column  totals;  and  XN  is  the  grand  total. 

The  information  in  the  first  K  columns  of  each  matrix  is  inputted  one  column 
at  a  time  either  by  specifying  the  body  of  the  table  (X(l,  1),  X(2,  1),  ...,  X(l,  K) , 
X(2,  K))  or  the  column  totals  and  the  first  row  (c(l),  X(l,  1),  ...,  C(K),  X(l,  K)), 
or  the  column  totals  and  the  second  row  (C(l),  X(2,  1),  ...,  C(K) ,  X(2,  K))  (See 
section  III,  intructions  for  card  input).  The  remaining  elements  are  computed  and 
the  complete  matrix  is  printed.  A  single  matrix  or  several  matrices  must  be  inputted, 
depending  on  the  purpose  of  the  analysis. 
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Each'  inputted  matrix  is  first  examined  to  detect  and  adjust  for  the  following 
conditions  —  if  they  exist: 

1. )  If  any  column  totals  are  zero  those  columns  are  deleted,  K  is  reduced 
accordingly,  and  the  new  matrix  is  printed. 

2. )  If  any  row  total  is  zero  or  if  only  one  column  total  is  nonzero,  then 
that  table  is  the  only  possible  table  with  the  given  row  and  column  totals.  As 
such  it  is  defined  to  be  degenerate  and  the  observed  level  of  significance  (for 
future  heterogeneity  tests)  is  set  to  1.  A  message  to  this  effect  is  printed. 

3 .  )  Steps  1  and  2  above  are  repeated  for  each  succeeding  inputted  contingency 
table. 

We  first  consider  tests  of  homogeneity  within  treatment  groups  and  later  we 

will  discuss  tests  across  treatment  groups. 

If  a  table  is  not  degenerate,  the  expected  cell  frequencies  (EX(I,  J))  are 

calculated  as  EX(I,  J)  =  R(I)  C(J)  /XN  (I  =  1,  2;  J  =  1,  . . . ,  K)  and  XSQ  is 

K  2  o 

calculated  as  XSQ  =  Z  l  (XU,  J)  -  EX(I,  J))z  /  EXCI,  J).  If  K  =  2  a  correction 

J=1  1=1 

for  continuity  is  applied  to  improve  the  convergence  of  the  distribution  of  XSQ  to 

its  asymptotic  chi  square  farm.  Namely  if  K  =  2 
2  2 

XSQ  =  I?1  (IXCI,  J)  -  EX(I,  J)  I  -  1/2) 2  /  EXCI,  J). 

The  table  of  expected  frequencies  and  the  user  specified  cutoff  value,  CUTOFF  (for 
what  constitutes  a  "large"  expected  frequency  within  each  cell),  are  printed. 

The  table  of  expected  frequencies  is  then  examined  to  see  if  any  of  the 
expected  frequencies  are  less  than  CUTOFF,  e.g.  5.  If  not  XSQ  is  considered  to  be 
asymptotically  distributed  as£2  with  K-l  degrees  of  freedom,  and  its  significance 
levex  (Af)  is  calculated  based  on  chi  square  theory.  Ihe  observed  chi  square  value 
and  its  significance  level  (Aj_)  are  printed.  If  one  or  more  of  the  expected 
frequencies  is  less  than  CUTOFF,  the  exact  distribution  of  the  XSQ  statistic  is 
calculated.  This  is  done  by  enumerating  all  possible  tables  with  the  given  row 
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and  column  totals  (algorithm  due  to  Boulton  and  Wallace,  1973)  and  their  associated 
chi  square  values.  Under  the  assumption  of  homogeneity  of  response  probabilities 
across  columns  the  probability  of  each  possible  table,  conditional  on  the  row  and 
column  margins  fixed,  is  (March,  1972) 


K 

IIC(  J) ! 

Rll)!  R(2)!J=i 


From  xhis  exact  distribution  over  possible  tables  the  exact  distribution  of  XSQ 
is  derived.  Based  on  this  derived  distribution,  the  significance  level  (Aj_)  of  the 
XSQ  value  associated  with  the  observed  table  (XSQOBS)  is  calculated  as  the  probability 
of  a  XSQ  value  greater  than  or  equal  to  XSQOBS.  The  observed  XSQ  value  and 
significance  level  (A^)  are  printed  and  optionally  (see  instructions  for  card  input) 
the  exact  XSQ  distribution  is  printed.  The  entire  process  is  tnen  repeated  on  the 
next  contingency  table. 

After  tests  of  homogeneity  (asymptotic  or  exact)  have  been  carried  out  on  each 
treatment  group,  the  significance  levels  Aj_,  A2,  . . . ,  A^  summarize  the  results 
of  the  independent  tests  of  the  homogeneity  on  each  table.  To  obtain  an  overall 
significance  level  these  independent  A^’s  are  combined  as  follows: 

For  groups  where  the  distribution  of  XsQ  has  been  approximated  by  its  asymptotic 
chi  square  form,  the  null  distribution  of  A^  is  approximately  uniform  (0,  1).  Thus 
Yj_  £  -  2  In  (hi)  has  an  approximate  chi  square  distribution  with  2  degrees  of  free¬ 
dom,  mean  E(Yj_)  £  EY;  s  E(-  2  In  (Aj_))  =  2,  and  variance  Var  (Y;)  £  VARY;  £  Yar  (-2  In 
(Ai))  =  4. 

For  groups  where  the  exact  small  sample  distribution  of  XdQ  has  been  used,  A^  and 
Yj_  £  -2  In  (Aj_)  have  discrete  null  distributions  derived  from  the  null  distribution 
of  the  contingency  table.  The  mean  E(Yj_)  2  EY-j_  and  variance  Var  (Yf)  £  VARY;  are 
calculated  from  the  exact  distribution  of  Yq  =  -2  In  (A,- ) . 


In  either  case  (i.e.  exact  or  asymptotic)  the  values  of  Yj_,  EYj_,  and  VARY^  are 
calculated  and  printed  along  with  the  other  resluts  for  each  table.  The  test 
*  statistic  for  the  overall  test  of  tank  to  tank  homogeneity  within  groups  is 

-  /EEY^ 

2  =, - 

VARY£/  (4  EEYj) 

it  is  calculated  and  printed  at  the  end  of  the  output.  Under  the  null  hypothesis 
that  the  tables  are  all  homogeneous,  Z  has  an  approximate  standard  normal  distri¬ 
bution.  The  null  hypothesis  is  rejected  for  large  values  of  Z. 

We  now  consider  additional  applications  of  EXAX2.  The  program  has  the 
capability  to  carry  out  chi  square  tests  of  homogeneity  across  treatment  groups  and 
to  construct  exact  confidence  intervals  on  the  odds  ratios  of  treatment  groups 
compared  with  the  control.  These  applications  are  discussed  in  turn,  beginning 
with  tests  of  homogeneity  across  groups. 

If  preliminary  tests  do  not  reveal  heterogeneity  among  tanks  within  treatment 
groups,  then  it  is  appropriate  to  sum  the  observed  frequencies  in  the  individual 
tanks  within  each  treatment  group.  This  results  in  a  new  2xN  contingency  table 
which  can  be  tested  for  homogeneity  across  treatment  groups.  RXAX2  can  perform 
the  appropriate  summing  within  groups  and  then  proceed  with  the  chi  square  test 
across  groups,  based  either  on  exact  or  asymptotic  theory,  as  discussed  above. 

Since  just  one  contingency  table  is  involved  in  this  application,  the  Z  statistic 
is  not  computed. 

EXAX2  das  the  capability  to  compare,  on  a  pairwise  basis,  the  odds  (p/(l-p)) 
within  each  treatment  group  to  the  odds  in  a  user  specified  group;  e.g.  the  control 
group.  Using  an  algorithm  given  by  Thomas  (1971)  an  exact  confidence  interval  is 
computed  for  each  odds  ratio  (one  per  treatment  group).  The  user  specifies  both 
the  upper  and  lower  alpha  levels,  thus  permitting  either  one  sided  or  two  sided 


confidence  intervals. 


To  illustrate  how  this  works  consider  the  frequencies  for  treatment  1  (the 
control  group)  and  treatment  T  as  fanning  a  2x2  contingency  table: 


Col  1 

Col  T 

Row  To- 

Row  1 

XU,  1) 

X(l,  T) 

Y(l) 

Row  2 

XC2,  1) 

(2,  T) 

Y(2) 

Col  Totals 

C(l) 

C(D 

YN 

Y(l),  Y(2) ,  YN  designate  respectively  the  two  row  totals  and  the  grand  total  of 
this  new  table.  The  odds  ratio  PSI  is  defined  as  PSI  =  (PiQ^)  /  (P^Qy) 
where  Px,  Px  are  the  category  2  probabilities  (i  •  ®  •  "success")  within  treatment 
groups  1,  T  respectively  and  Qq  5  1-Pq,  Qy  =  1-Py  are  the  category  1  probabilities 
within  treatment  groups  1,  T  respectively. 

We  estimate  these  quantities  by 
=  X(2,  1)  /  C(l) 

PT  =  X(2,  T)  /  C(T) 

Qq  =  XU,  1)  /  C(l)  =  1-Pq 

QX  =  X(l,  T)  /  C(T)  =  1-PX 

PSI  =  (X(2,  1)  X  (1,  T))  /  (X(l,  1)  X  (2,  T)). 

Thonas'  algorithm  is  an  iterative  technique  for  finding  upper  and  lower 
conficence  bounds  on  PSI.  It  is  based  on  the  noncentral  distribution  of  Fisher's 
exact  test  statistic. 

Ill  Instructions  for  Card  Input 

In  this  section  we  present  detailed  instructions  for  card  input  to  EXAX2. 

The  card  input  consists  of  5-12  Program  Information  Cards  followed  by  the  Data 
Cards  and  the  Alpha  Card.  The  Program  Information  Cards  are,  in  order:  one  Input 
Option  Card,  one  Parameter  Card,  one  Format  Card,  one  Header  Card,  one  Labels  Card, 
and  from  one  to  six  Title  Cards.  These  cards  must  be  punched  as  described  below: 

1.  INPUT  OPTION  CARD  This  card  should  have  a  1  in  card  column  1  if  the 
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subsequent  data  cards  represent  K  tanks  per  treatment  group  and  the  tank  data  is 
to  be  pooled  (summed)  within  each  treatment  group  to  produce  one  2XN  contingency 
table  which  will  then  be  analyzed .  There  should  be  a  2  in  card  column  1  if  no 
pooling  is  to  occur.  In  this  case,  the  data  will  be  tested  for  homogeneity  among 
tanks  within  treatment  groups. 


2.  PARAMETER  CARD 


Card  Cols. 


Description 


K  =  the  number  of  columns  in  each 
contingency  table  (a  right  justified 
integer)  (K  is  the  number  of  tanks 
per  treatment  group.)  K  should  be  less 
than  or  equal  to  12  if  the  number  on  the 
Input  Option  Card  is  2. 

CUTOFF  =  the  smallest  expected  cell 
frequency  with  which  the  use  of  the 
asymptotic  chi  square  approximation  is 
permitted.  If  one  or  more  expected  cell 
frequencies  are  smaller  than  CUTOFF  then 
the  exact  small  sample  distribution  of 
XSQ  is  used.  (CUTOFF  is  a  real  number 
with  decimal  point . ) 

blanks 

NTITLE  -  the  number  of  Title  Cards  used 
(an  integer  from  1  to  6) 

N  =  the  number  of  2XK  contingency  tables, 
i.e.  the  number  of  groups  (both  treatment 
and  control. ) 

If  the  number  on  the  Input  Option  Card  is 
is  1,  then  N  should  be  less  than  or  equal 
to  12. 

blanks 


IOPT  =  1,  if  the  exact  distribution  of 
XSQ  is  to  be  printed 
=  0,  otherwise. 

blanks 

IDATA  =  1,  if  data  card  input  is  of  the  forr 
C(I),  X(l,  I)  (see  the  description  of  the 
Data  Cards  given  below  for  an  explanation  of 
this  notation) 

=  2  if  data  card  input  is  C(I),  X(2,  I) 

=  3  if  data  card  input  is  X(l,  I),  X(2,  I) 


Card  Cols. 


_  Description 

^0  IC  =  group  number  corresponding  to  group 

(control  or  treatment)  to  which  each  other 
treatment  group  is  to  be  compared  when 
calculating  confidence  intervals  on  the 
odds  ratios  (e.g.  the  number  of  the  con¬ 
trol  group). 

It  should  be  noted  with  respect  to  the  specification  of  K  in  columns  1-5  of 
the  parameter  card  that  seme  groups  may  have  L<K  tanks.  The  associated  contingency 
tables  for  these  groups  thus  have  L<K  columns.  These  tables  must  be  augmented  with 
K-L  columns  of  zeros  by  inputting  K-L  Data  Cards  containing  zeros.  The  program 
will  later  delete  these  dummy  columns  and  perform  its  computations  based  only  on 
the  original  L  columns. 

3.  FORMAT  CARD 

This  card  contains  the  format  in  which  the  subsequent  data  cards  have  been 
punched,  e.g.  (F5.0,  F5.0),  (see  the  description  of  the  data  cards  given  below). 

Note:  No  I  or  A  formats  may  be  used!  All  80  card  columns  may  be  used.  The  format 
statement  may  be  positioned  anywhere  on  the  card  and  must  include  the  parentheses 
(see  any  FORTRAN  text  for  an  explanation  of  fomats) . 

4.  HEADER  CARD 

This  contains  a  special  heading  (it  may  be  blanks)  which  will  be  printed  above 
the  output  far  each  contingency  table.  Any  or  all  of  the  80  columns  may  be  used. 

(.e.g.  Embryo  Mortality) 

5.  LABELS  CARD 


Card  Cols. 


Description 


blanks 


A  label  for  the  first  row  of  the  contin¬ 
gency  table 


11-12 


13-20 
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blanks 

A  label  for  the  second  row  of  the  contin¬ 
gency  table  (e.g.  "live",  "dead") 


6.  TITLE  CARDS 


These  cards  contain  titles  which  will  be  printed  above  the  output  for  each 
contingency  table.  Any  or  all  of  the  80  columns  on  each  card  may  be  used.  The 
number  of  Title  Cards  must  agree  with  the  entry  NTITLE  on  the  Parameter  Card.  If 
no  titles  are  desired  there  must  be  a  blank  card  here  and  NTITLE  must  equal  1. 

7.  DATA  CARDS 

An  observed  2xK  contingency  table  can  be  specified  as  a  3x(K+l)  matrix 
partitioned  as  follows: 

Row  1 
Row  2 

Col.  Totals 

The  X(I ,  J)  comprise  the  "body"  of  the  table;  R(l)  and  R(2)  are  row  totals;  C(l), 
C(2),  ...,  COO  are  the  column  totals;  and  XN  is  the  grand  total.  Each  Data  Card 
contains  the  content  of  one  of  the  first  K  columns  of  this  matrix.  This  information 
_an  oe  specified  in  one  of  three  ways.  If  IDAIA  =  1  (in  card  column  3b  of  the 
ATometer  Card),  the  I-th  Data  Card  must  contain  C(I),  X(l,  I).  If  IDATA  =  2,  the 
.-t.\  ata  Card  must  contain  C(I),  X(2,  I).  If  IDAIA  =  3,  the  l-th  Data  Card  must 
cont.un  X(l,  I),  X(2,  I).  EXAX2  will  compute  the  other  matrix  elements.  The 
rormat  given  on  the  second  Program  Information  Card  specifies  the  format  under 
which  this  information  will  be  read. 

The  K  Data  Cards  for  the  first  contingency  table  are  followed  by  the  K  Data 
Cards  for  the  second  contingency  table  and  so  on  until  all  N  contingency  tables  have 
been  given.  There  will  thus  be  NxK  Data  Cards  in  all.  The  form  of  the  data  for 
each  contingency  table  must  be  consistent  with  the  entries  on  the  Program  Information 
Cards. 
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8.  ALEHA  CARD 


Card  Cols  Description 

1-5  ALFHAL  =  the  lower  alpha  level  for  the 

odds  ratio  confidence  intervals 

6-10  ALFHAU  =  the  upper  alpha  level  for  the 

odds  ratio  confidence  intervals. 

These  must  be  decimal  numbers  less  than  or  equal  to  one  and  their  sum  must  be 
less  than  or  equal  to  one. 

If  the  data  consist  of  just  one  tank  per  group  or  if  EXAX2  is  instructed  to 
pool  data  across  tanks  within  groups,  then  confidence  intervals  on  the  odds  ratios 
are  automatically  computed.  If  the  data  consist  of  more  than  one  tank  per  group 
(i.e.  K>1)  and  EXAX2  is  instructed  not  to  pool  across  tanks  within  groups  then 
confidence  intervals  are  not  computed  and  this  card  may  be  emitted. 

The  input  cards  are  placed  at  the  end  of  the  program  deck  between  Hie 
/ /G0.SYSIN  DD  *  card  and  the/*  card.  The  example  below  illustrates  the  input 
stream. 
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In  this  example  the  tank  data  will  be  pooled  within  treatments  and  95%  confidence 
intervals  (ALPHAL  =  0.025,  ALPHAU  =  0.025)  will  be  computed  on  the  odds  ratios 
between  treatment  one  and  every  other  treatment.  There  are  six  treatments  with  4 
tanks  each.  The  exact  distribution  of  XSQ  will  be  printed  if  it  is  used  in  the 
homogeneity  test;  i.e.,  if  any  of  the  expected  frequencies  is  less  than  5.0  (.CUTOFF) . 
There  are  live  title  cares  preceeding  the  data  cards  trem  which  the  total  number 
of  embryos  and  the  number  alive  will  be  read. 


IV  Program  Limitations 

EXAX2  has  limitations  of  time  and  space.  The  space  limitations  are  due  to  the 
dimensioned  size  of  various  arrays.  When  pooling  tanks  within  treatments,  N  must 
be  less  than  or  equal  to  12  and  when  not  pooling  tanks,  K  must  be  less  than  or 
equal  to  12.  This  limit  can  be  raised  by  increasing  the  sizes  of  the  arrays  in 
the  following  table: 

Program  Location  Arrays 


MAIN  crogram 
SUBROUTINE  TABLE 
SUBROUTINE  INPUT 
SUBROUTINE  INRJTI 
SUBROUTINE  INFUT2 
SUBROUTINE  EXAX 


X,  C,  EX,  ICHECK 
X,  c 
X,  c 
X,  c 
X  c 

y’  RX,  XSQT,  D,  V,  U,  DL1M 


Depending  upon  the  number  of  tables  (with  the  given  margins)  that  are  enumerated 
in  generating  the  exact  distribution  of  XSQ,  it  may  also  be  necessary  to  increase 
the  sizes  of  the  arrays  SIG,  XSQ,  PROB  and  IPOINT  in  SUBROUTINE  EXAX.  Also  in 
FUNCTION  FUN  (a  part  of  Thomas'  confidence  interval  algorithm)  there  is  a  machine- 
dependent  constant  (DPC)  vhich  should  be  set  in  the  DATA  statement  to  the  largest 
real  number  the  machine  can  hold. 

In  some  cases  the  CPU  time  required  by  the  program  may  be  quite  large.  No 
systematic  study  has  been  performed  to  determine  when  this  is  so,  but  the  following 
two  examples  may  be  of  interest: 

1. )  For  a  2x10  table  with  column  margins  all  equal  to  50  and  row  margins  of 

46  and  454,  the  smallest  expected  cell  frequency  is  4.6.  If  CUTOFF  =  5.0,  the  exact 
distribution  of  XSQ  will  be  generated  requiring  the  enumberation  of  over  435,000 
tables  and  a  CFU  time  of  over  10  minutes  on  the  AMDAHL  470  V6  computer. 

2 .  )  For  a  2x6  table  with  column  margins  all  equal  to  6 0  and  row  margins  of 
333  and  27,  the  smallest  expected  frequency  is  4.5.  If  CUTOFF  =  5.0,  the  exact 
distribution  of  XSQ  will  be  generated  requiring  a  CPU  time  of  over  50  seconds. 
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It  should  also  be  noxed  that,  as  Agresti  and  Wackerly  (1977)  point  cut, 
generally  for  a  fixed  grand  total  the  number  of  tables  enumerated  (and  hence 
the  CRJ  time)  in  generating  the  exact  distribution  is  much  higher  when  the 
column  margins  are  equal  than  'when  they  are  unequal. 
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APPENDIX  AX*  THEORETICAL  BASES  OF  SUGGESTED  OUTLIER  DETECTION 
TRANSFORMATIONS 

In  this  appendix  we  discuss  the  motivations  and  theoretical  bases 
underlying  the  outlier  detection  procedures  that  are  illustrated  in  the 
body  of  the  section. 

Suppose  that  the  data  originate  from  I  treatment  groups,  J  tanks 
per  group.  Consider  the  i-th  group.  Let  X^  j ,  Nij  denote  the  number  of 

responses  and  the  total  number  of  fish  in  the  j-th  tank,  j  =  1 . J, 

and  let  denote  the  pooled  estimated  response  rate  (f)^  =  E.Xjj/E.N.,) , 

A  i  A  J  J  J  J- J 

"1  - 1  -  h- 

In  subsequent  discussion  we  omit  the  subscript  i  for  notational  convenience 
and  so  these  quantities  are  denoted  as  f>,  §,  X  ,  N  ,  respectively.  Let 
*  ,  SlV  1  1 

Consider  the  chi  square  statistic  for  testing  for  tank  to  tank 
heterogeneity  within  groups 


2  J  -  V)2 

A  £  NjM 

This  statistic  is  distributed  as  chi  square  with  J-l  d.f.  under  the  null 
hypothesis  of  no  tank  to  tank  heterogeneity.  For  the  purpose  of  detec¬ 
ting  outlying  responses  we  consider  three  cases: 

Case  1:  All  expected  frequencies  within  the  group  are  greater 
than  a  specific  cutoff,  e.g.  5. 

Case  2:  f$<0.1  or  p>0.9. 

Case  3:  0.1<£<0.9  and  one  or  more  expected  frequencies  of 

responses  is  less  than  the  cutoff,  e.g. 5. 

We  suggest  somewhat  different  transformation  in  each  case. 

Case  1.  All  the  expected  frequencies  within  the  group  exceed  the  cutoff 

Consider  the  individual  terms  in  the  chi  square  test  statistic, 
(Xj  -  Nj£)/[Nj£<i]l/2.  Assume  that  the  weights  in  the  denominator  are 
"correct"  and  "fixed".  The  quantities  Xj ,  ^  in  the  numerator  are  co¬ 
rrelated  since  f>  includes  Xj  .  It  can  be  shown  that  the  variance  of 

^Appendix  AX  is  an  appendix  for  Section  X. 


(X,  -  NJ,6)/tNJM]1^2  is  1  -  (N./N).  If  all  the  N  's  are  equal,  this 
j  J  J  J  ^ 

variance  is  1  -  1/J.  If  all  the  expected  frequencies  in  the  table  exceed 

5,  as  is  the  case  with  the  Defoe  compound  C  embryo  mortality  data,  then 
the  quantities 


(1  -  N  /N) 


-1/2 


xi  ~  V 


[NjPq] 


1/2 


j  =  1, 


J 


can  be  treated  as  having  an  approximate  standard  normal  distribution. 
Graphical  and  numerical  outlier  detection  procedures  are  based  on  these 
standardized  ratios.  We  pool  them  across  groups  and  plot  the  resulting 
IJ  values  on  normal  probability  paper.  However,  for  the  purpose  of 
formal  inference  we  account  approximately  for  the  correlation  among  terms 
within  groups (approximately  -1/(J  -  1))  by  treating  the  J  values  witnin 
each  group  as  if  they  were  J  -  1  independent  values.  This  adjustment 
of  course  has  the  most  impact  when  J  =  2.  The  normality  assumption 
might  be  enhanced  by  first  carrying  out  an  arc  sine  variance  stabilizing 
transformation. 


Case  2 .  Expected  response  probability  within  the  group  is  small 

e.g.  p<0.1 

Case  2  is  also  applicable  to  the  situation  when  p>.9,  by  conci- 
dering  the  complementary  response. 

The  distribution  of  can  be  approximated  by  a  Poisson  distri¬ 
bution  with  mean  Aj  =  Njp.  The  variance  stabilizing  transformation  in 
the  Poisson  case  is  well  known  to  be  the  square  root  transformation.  In 
1/2  1/2 

particular,  2(X,  -  A  )  has^an  approximate  standard  normal  distri¬ 

bution.  We  estimate  A  Jby  N.f?  =A. .  Now  X  ,  A,  are  positively  correlated 
c-  J  j  j  j  j 

since  A^  includes  X^ .  It  can  be  shown  by  a  Taylor  expansion  argument 

that  the  variance  of  2(X^^  -  A^^)  approximately  (1  -  N^/N).  If 
all  the  Nj's  are  equal,  this  variance  is  1  -  1/J. 

Thus  if  p<0.1  (or  if  §<0.1),  the  quantities  (1  -  N./N)^^  x 

1/2  1/2  J 

2(X^  -  (Nj0)  ),  j  =  1,  ...»  J,  can  be  treated  as  having  an  approxi¬ 

mate  standard  normal  distribution.  Graphical  and  numerical  outlier  de¬ 
tection  procedures  are  based  on  these  standardized  values.  We  carry  out 
the  same  types  of  analyses  with  these  values  as  with  the  standardized 
ratios  calculated  under  case  1.  For  formal  inferences  we  account  appro¬ 
ximately  for  the  correlation  among  terms  within  groups  (approximately 
-  l/(j  -  1)),  as  we  did  in  case  1,  by  treating  the  J  values  within  each 
group  as  if  they  were  J  -  1  independent  values 
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We  follow  the  suggestion  of  Barnett  and  Lewis  and  carry  out  the 
arcsine  variance  stabilizing  transformation.  In  particular  2Nj  '  [arc 
sin  (p  ^  2)  -  arc  sin  (p"*"^)]  has  an  approximate  standard  normal  distri- 


j 


bution  as  -*■  °°.  We  estimate  p  by  p.  It  can  be  shown  that  the  variance 
of  2Nj [arc  sin  (p^^)  -  arc  sin  (p^^)]  is  approximately  (1  -  N^/N). 


If  all  the  N.’s  are  equal  this  variance  is  1  -  1/J.  Thus  the  quantities 


1/2, 


j 


v-1/2, 


2NjJ‘/ ‘(l  -  N^/N)  x/x[arc  sin  (Pj^^)  -  arc  sin(px/x)],  j  =  1,  . 

be  treated  as  having  an  approximate  standard  normal  distribution.  Graph¬ 
ical  and  numerical  outlier  detection  procedures  are  based  on  these  standa 
rdized  values.  We  carry  out  the  same  types  of  analyses  with  these  values 
as  with  the  standardized  ratios  calculated  under  case  1.  For  formal  in- 


a/2 


J  can 


ferences  we  account  approximately  for  the  correlation  among  terms  within 
groups  (approximately  -  1/(J  -  1)),  as  we  did  in  the  previous  cases,  by 
treating  the  J  values  within  each  group  as  if  they  were  J  -  1  independent 
values . 


APPENDIX  AXV.  CONFIDENCE  INTERVAL  ON  CONCENTRATION  THAT  CORRESPONDS  TO 
A  GIVEN  LEVEL  OF  INCREASE  IN  RESPONSE  OVER  CONTROL  GROUP 
RESPONSE. 

After  we  have  fitted  the  nonlinear  regression  model  we  wish  to  calcu 
late  confidence  bounds  on  the  safe  dose.  We  use  Fieller’s  theorem. 


Suppose  we're  willing  to  tolerate  a  response  rate  L  above  the  control 
group  rate,  C. 

We  want  a  confidence  interval  on  d  _  such  that 

safe 


$(30  +  Bldsafe)  "  L  (30»  31*  c)* 


The  standard  probit  fit  assumes  that 


p(£;  d)  =  c  +  (l  -  c)$(B0  +  Bjd).  (l) 
where  c  is  the  background  rate. 

A 

We  obtain  ^  by  a  maximum  likelihood  fit  of  the  model  using  a  non¬ 
linear  regression  program  or  using  SAS  PROC  PROBIT. 

We  then  wish  to  solve  the  equation 


W0  +Bld.«fe>  *  L- 


Thu.,  80  +  6jdeafe  -  t‘(L)  =  fL  (3) 


Appendix  AXV  is  the  appendix  foi  Section  XV. 


i 

'n 

i£ 

¥ 

! 


i 


i; 


The  point  estimate  for  &Bafe  is  ^8afe 


$"1(L)  -  6C 

A 

6l 


(A) 


Placing  a  confidence  interval  on  dsafe  is  now  a  direct  application  of 
Fieller's  Theorem.  See  Mandel  [  44  ] ,  page  279  or  Graybill  [  45  ] ,  pages 
126-127 

Thus  for  fixed  d,  a  1  -  a  confidence  interval  on  y^  -  Bq  +  Bjd  is 


A  A 


yde  S0  +  31d  +  za/2/g  +  2hd  +  Jd‘ 

A  A  A  A  A 

where  g  =  Var(3Q),  j  =  VarC^),  h  =  Cov(3Q,  . 


The  confidence  interval  on  d  ,  includes  all  d's  such  that 

safe 


(5) 


/s  /v 


£Le60  +  Sld  i  \l2r*  +  2h  +  Jd‘ 


A  /V 


i.e.  (SQ  +  Bxd  -  fL)2  <  za/2(g  +2h  +jd2) 


H - J- 


~*y 


Thus  the  limits  on  dga£g  are  obtained  by  solving  the  equations 


A  A 


fL  -  S0  +  Bjd  +  .  ,/i  +  2hd  +  Jd 


Thus 


A  A 


(B0  +  Bjd  -  fL)2  =  z2/2(g  +  2hd  +  jd2) 


(6) 


(7) 
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A  necessary  and  sufficient  condition  for  Fieller's  Theorem  to  yield  a 

valid  confidence  interval  for  d  ,  is  that 

safe 


2 

gl  -  J‘a/2  >  0 


That  is,  must  be  "many"  std  error  units  away  from  0. 

Under  the  condition  in  equation  (8)  a  1  -  a  two  sided  confidence 

interval  on  d  ,  is 
safe 


-  B  +  *B2  -  4AC 


d  c  £ 
safe 


where  A,  B,  C  are 


2 

A  =  -  Kn 


B  - 

c  =  [<80  -  fL)2 


hza/2l 

*Za/2] 


Suppose  now  we  wish  to  calculate  a  1  -  a  level  lower  bound  on  d  ,  . 

-  safe 

This  is  the  value  that  could  be  used  for  regulatory  purposes.  By  an 
argument  similar  to  the  one  above,  it  can  be  shown  that  the  form  of  the 
confidence  interval  is  the  same  as  above,  except  that  is  replaced 

by  and  the  smaller  root  in  equation  (9)  is  used.  Thus  to  calcu¬ 
late  a  lower  95  percent  confidence  bound  on  we  use  the  lower  end 

point  of  a  90  percent  two  sided  confidence  interval  and  d  _  ,  etc. 

safe 


/ 


)IX  AXVI.l.  CONFIDENCE  BOUNDS  ON  BINOMIAL  PROBABILITIES 


We  observe  p  -  —  Y^Bi(n,  p). 

We  wish  to  construct  a  1  -  a/2  sided  confidence  interval  on  p. 

These  limits  can  be  obtained  easily  using  the  Clopper-Pearson  charts. 

See  Dixon  and  Massey  [  13  ]  pp  501-504. 

Tables  of  such  confidence  limits  are  given  in  Natrella  [  48  ] . 


where 

p  -  [l  +  n  ~  +  F(2n  -  2Y  +  2,  2Y;  1  -  a/2)J 

%  =  .  n  -  Y  _ 1 _ 1  1 

P  Lx  ^  Y  +  1  F(2Y  +2,  2n  -  2Y;  1  -  a/2)J 


=  0  if  Y  =  0 


p  =  1  if  Y  =  n 


Appendix  AXVI.l  is  an  appendix  for  Section  XVI. 
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I.  INTRODUCTION 


Dose  response  experimentation  has  many  applications  such  as 
toxicity  tests,  bioassay,  engineering  stress  tests,  tests  of  response 
to  advertising  campaigns,  just  to  name  a  few.  This  writeup  considers 
application  of  dose  response  experimentation  to  tests  of  toxicity  on 
fish  and  other  water  species.  However  the  procedures  and  the  computer 
program  discussed  are  relevant  to  many  other  applications. 

In  aquatic  toxicity  tests  samples  of  fish  or  other  water  species 
are  exposed  in  tanks  or  beakers  to  the  substance  or  substances  under 
study.  The  exposures  are  carried  out  at  a  succession  of  concentrations 
starting  at  or  near  zero  (i.e.  the  control  groups)  and  progressing  to 
relatively  high  and  lethal  concentrations.  Responses  of  these  creatures 
to  the  toxicant  are  recorded  and  degrees  of  response  are  compared 
across  concentration  groups. 

Many  different  responses  are  recorded.  Some  of  these  are  percent 
of  embryos  hatched  live  or  hatched  normal,  percent  of  fry  surviving 
for  a  fixed  duration  or  to  the  conclusion  of  the  test,  body  weight  or 
body  length  of  surviving  fish,  numbers  of  eggs  laid,  numbers  of  eggs 
hatched.  These  various  responses  give  rise  to  several  different  types 
of  data-quantitative  (e.g.  body  weights),  count  (e.g.  number  of  eggs 
laid),  quantal  response  (e.g.  hatch/no  hatch,  live/die,  normal /abnorma 1 ) 
The  discussion  here  pertains  to  quantal  responses. 


Quanta!  response  toxicity  tests  often  give  rise  to  binomial  dis¬ 
tribution  data.  Namely  within  each  test  chamber  a  certain  number  of 
organisms  are  placed  on  test.  Under  reasonable  assumptions  the  numbers 
of  "successes"  (e.g.  numbers  of  fish  per  tank  that  die  before  the  con¬ 
clusion  of  the  test)  follow  the  binomial  distribution.  Many  standard 
methods  have  been  proposed  over  the  years  to  fit  models  to  such  binomial 
dose  response  data.  Finney  (1971)  and  Cox  (1970)  discuss  two  of  the  most 
comnonly  used  empirical  models,  namely  the  probit  and  logit  dc  response 
functions.  Background  response  is  commonly  accounted  for  in  t 
models  by  means  of  Abbot's  correction  (see  Finney,  Chapter  7).  jmber 
of  other  parametric  dose  response  models  have  been  proposed,  based  on 
empirical  or  on  mechanistic  considerations.  See  Kalbfleisch  and  Prentice 
(1980),  pp.  195-198  for  a  description  of  a  general  family  of  dose 
response  models. 

Determining  safe  concentrations  by  inferences  based  on  dose 
response  curves  has  an  important  advantage  over  determinations  based  on 
hypothesis  tests  to  compare  treatment  and  control  groups.  Namely  if  a 
particular  test  is  either  too  small  or  too  variable  then  a  hypothesis 
test  comparing  treatment  and  control  group  response  rates  may  not  be 
sufficiently  powerful.  It  may  thus  not  be  able  to  detect  moderate 
sized  changes  in  response  rate  from  the  control  group.  This  has  the 
effect  of  raising  the  estimated  "no  effect"  level,  which  is  unconserva¬ 
tive.  Such  a  problem  might  well  arise  in  the  presence  of  a  reasonable 
sized  background  response  rate.  By  contrast,  decreased  sample  size  or 
increased  variability  reduce  lower  confidence  bounds  on  safe 


concentrations  derived  from  dose  response  curve  fits.  In  this  sense, 
inferences  about  safe  concentrations  based  on  dose  response  curves  are 
more  conservative  than  inferences  based  on  hypothesis  tests. 

However  parametric  dose  response  models  have  the  common  drawbacks 

that 

1.  Inferences  about  percentiles,  especially  low  percentiles, 
of  the  response  distribution  can  be  very  sensitive  to  the 
specific  functional  form  assumed. 

2.  Inference  procedures  are  generally  based  on  asymptotic  normal 
maximum  likelihood  theory  and  may  thus  be  inappropriate  for 
data  sets  with  small  sample  sizes  or  with  many  treatment 
group  response  rates  near  0%  or  near  100%. 

3.  Results  of  the  high  concentration  treatment  groups, 
far  away  from  the  safe  concentrations,  have  important 
influence  on  the  estimation  of  the  low  percentiles  of  the 
dose  response  curve. 

4.  Background  responses  are  accounted  for  in  a  structured 
parametric  manner,  such  as  by  Abbott's  correction.  The 
form  of  this  background  correction  may  influence  the 
determination  of  safe  concentrations. 

The  procedure  discussed  in  this  writup  avoids  many  of  these 
problems.  It  does  not  require  assumption  of  a  specific  functional 
form  of  the  dose  response  curve.  It  is  based  on  exact  small  sample 
distribution  theory.  It  uses  information  only  from  the  lower  half  of 
the  dose  response  curve.  It  does  not  require  a  structured,  parametric 
form  of  correction  for  background. 
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One  assumption  made  throughout  this  writeup  is  that  no  tank  to 
tank  heterogeneity  is  present  within  treatment  groups.  This  implies 
that  the  responses  can  be  pooled  across  tanks  within  treatment  groups 
and  summarized  by  a  single  binomial  distribution  per  group.  If  tank 
to  tank  heterogeneity  exists  it  can  be  accounted  for  by  adjusting  the 
data  to  reflect  within  tank  correlation,  by  fitting  more  complex  model 
which  explicitly  account  for  such  heterogeneity,  or  by  carrying  out 
analyses  on  a  per  tank  basis.  The  procedure  discussed  in  this  writeup 
can  be  used  in  conjunction  with  the  first  adjustment  approach. 


II.  PROCEDURE 


K’ 


The  aim  is  to  calculate  a  lower  confidence  bound  on  the  "safe" 

concentration.  The  "safe"  concentration  is  defined  to  be  the  greatest 

concentration  for  which  the  response  rate  is  at  most  100L%  above  the 

control  group  response  rate.  Note  that  this  does  not  imply  that  100L% 

response  to  toxicant  is  considered  to  be  "acceptable".  We  wish  to 

eliminate  risk  altogether.  However,  we  can  be  confident  with  such  a 

criterion  that  at  worst  we  have  limited  the  risk  to  this  level. 

The  procedure  described  in  this  section  is  a  nonparametri c  approach 

to  determining  such  a  safe  concentration.  Thus  it  is  not  necessary  to 

specify  a  particular  parametric  form  for  the  dose-response  curve.  The 

% 

procedure  was  motivated  by  one  discussed  in  Gross,  Fitzhugh,  and  Mantel 
(1970)  for  quantitative  response,  but  differs  from  it  in  a  number  of 
respects. 

Consider  a  dose  response  curve  relating  percent  response  (e.g. 
mortality)  to  toxicant  concentration. 


I  • 

L\- 


£ 


We  concentrate  on  the  portion  of  the  curve  that  is  concave  upward. 

In  the  case  of  a  probit  or  a  logit  model  this  would  be  all  concentra¬ 
tions  below  the  EC50,  if  there  is  no  background  and  even  beyond  the 
EC50  if  there  is  background  response. 

Suppose  that  the  toxicity  test  involves  several  tanks  per  concen¬ 
tration  group  and  that  we  have  determined  which  concentration  are  in 
the  concave  upward  portion  of  the  curve  (e.g.  below  the  EC50).  We 
might  do  this  by  looking  at  a  graph  or  by  a  preliminary  analysis. 

Let  Cq,  c-j  ,  c2,  c3>  ...»  cr  denote  these  concentrations.  Since  we 
assume  the  absence  of  tank  to  tank  heterogeneity  within  groups,  we 
pool  the  responses  across  tanks  within  groups  to  obtain  the  estimated 

A  AAA 

response  rates  pQ  at  c0(control),  p1  at  c-j ,  p2  at  c2,  P3  at  c3,  .... 

/V 

Pr  at  cr.  Let  p0,  p-j ,  p2,  p3,  pr  denote  the  "true"  response 
rates  at  these  concentrations. 

We  wish  to  construct  a  lower  confidence  bound  on  the  "safe"  con- 
centrati on . 

Definition.  A  safe  concentration  is  one  that  increases  the  response  at 
most  L  (limit)  units  above  background. 

Let  CL  denote  this  "safe"  concentration.  Suppose  it  can  be 
asserted  on  a  priori  grounds  that  CL  lies  in  the  interval  (0,  Ck) 
where  C-j  <  Ck  <  Cr- 
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throughout  the  interval  (Cq,  Cr).  Thus  L/B  is  a  lower  bound  on  C^. 

Let  pk  be  an  exact  upper  confidence  bound  on  pR  and  let  jj>0  be  an 
exact  lower  confidence  bound  on  p.  Expressions  for  such  exact  con¬ 
fidence  bounds  on  pQ,  pR  were  derived  by  Clopper  and  Pearson  (1934) 
and  are  contained  in  Hollander  and  Wolfe  (1973),  pp.  23-24.  Set 

I  .  pk  -  fo  (,, 

ck  -  c0  (2) 

A  /V 

Then  Bu  is  an  upper  confidence  bound  on  6.  Thus  Cq  +  L/Bu  is  a  lower 
confidence  bound  on  L/B  and  therefore  also  on  CL>  Since  we  assume 
that  CL  £Ck,  our  final  lower  confidence  bound  on  CL  is 

CL  =  min(CQ  +  L/Bu,  Ck)  (3) 

A 

It  may  be  possible  to  increase  CL  by  including  information  from 
Ck+1»  ck+2>  •••»  cr  in  t0  that  from  co»  ck-  Namely,  we  fit 

straight  lines  using  (CQ,  CR),  (CQ,  Ck+1),  (CQ,  Ck+2),  ....  (CQ,  Cr) , 
(Cq,  Cr,  CLx1  ) ,  (Cn,  c„  ) . (Cn,  CL,  CJ,  (Cn,  C  1/4.1  *  C| ,*?)  , 

^C0*  Ck+1 

....  Cr).  That  is,  we  include  Cq  and  all  possible  combinations  of 
(ck,  Ck+1,  ...,  Cr).  Thus  there  will  be  2^r_k+1^  -  1  lines  calculated 
altogether.  The  lines  are  fitted  by  ordinary  least  squares. 

Consider  the  caluclation  of  an  upper  bound  on  the  slope  of  the 

t 

line  based  on  the  subset  of  concentrations:  Cq,  Ck^ ,  Ck^, 

....  Ck  where  Ck  <_  CR  <_  Ck  <_  CR  <  Cp.  Denote  this  as  the  "s-th 
J  1  2  J 

subset"  (in  some  ordering  of  subsets).  Define 


(4) 


Cs  ■  <C0  +  jl,  Ckj>/(J  +  '> 


e 


<co  -  cs>Pq*  *  i,  (ck.  -  Cs>Pkj 
<co  -  y2  *  j  <ck,  -  ‘s>2 

J  '  J 


p0*  =  Po 


£k .  if  Ck,  <  Cs 


pj*  = 


pk,  if  ck.  "  Cs 


J  "j 

s  -  1,  2,  ....  2(r-k+1)  -  1.  We  let 


6u  =  misn  6U,S 


CL  =  min(L/Bu,  Ck) 


(5) 

(6) 

(7) 

(8) 


There  is  a  problem  associated  with  simultaneous  inference.  We 

a 

wish  to  assert  with  a  given  level  of  confidence  that  0U  >_  8.  It  can 

'X,  * 

be  proved  that  if  <  Pj  <  Pj ,  j  *  0,  k,  k+1,  k+2,  ....  r  then  6U  1  8. 
To  guarantee  this,  with  specified  probability  1  -  a,  we  use  Bonferroni's 
inequality.  Namely  we  calculate  r-k+2  simultaneous  two-sided  1  -  a 
confidence  intervals  by  calculating  each  two-sided  confidence  interval 
at  level  1  -  a/(r-k+2).  Thus,  with  probability  1  -  a,  at  least,  all 
the  intervals  are  assured  of  containing  their  "true"  response  rates. 
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In  the  case  of  "large"  samples  we  can  calculate  alternative  upper 
bounds  on  8  by  use  of  the  normal  approximation  to  the  binomial 
distribution.  Assume  for  definiteness  that  the  minimum  cell 

A  As 

frequency  is  at  least  5.  That  is,  N.p.  >_  5,  N.-q.  >  5,  j  =  0,  k, 

J  J  J  J 

k+1 . r.  (Actually  Dixon  and  Massey  (1969),  page  238,  state  a 

more  liberal  standard.)  Then 


(C0  "  Cs)p0  +  I  ^Ck .  ‘  Cs)pk. 
_ J— i _ 2 _ u 

(cn  -  rj*  +  Hc„  -  cj2 


j=i  \i 


is  a  point  estimate  of 


(C0  '  Vp0  +  X  ^Ck.  "  ^pk. 

2  =  ,  ■  - _ _  JrJ _ 2 _ J. 

0,s  J 

(co  •  V*  *  X  <ckl  -  «,)* 

J  ’  J 


s.e.(es)  = 


std.err.  (8S) 


(r  .  q  )2  P0_q0  +  y  (c  _  r  )2  P_k4 q kv 
V  Hn  A,  ^  k,  V  N» 


'o _ tL 

o 


c(c0  -  ?.)2  +  l  (C.  -  C,)2]2 

u  5  j=l  kj  5 


eu,s  2  Ss  +  2l-aS-e'  (0s> 


is  an  upper  1  -  a  level  normal  theory  confidence  bound  on  8. 
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To  obtain  simultaneous  upper  confidence  intervals  we  use 

ft'  -  k+1 ) 

Bonferroni's  inequality  for  family  size  2V  -  1;  i.e., 

individual  intervals  at  level  1  -  a/(2r”*<+^  -  1).  As  before 


Bu  =  misn  Bu,s 


CL  =  min(L/B,  Ck) 
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III.  INSTRUCTIONS  FOR  PROGRAM  INPUT 


Mortality  or  Abnormality  Data: 

In  this  program.  Subroutine  RDSURV  handles  mortality  data  input. 
The  mortality  or  abnormality  data  is  read  from  Fortran  file  4.  Each 
input  card  of  the  mortality  data  must  supply  the  program  with  the 
following  two  numbers  (in  this  order): 

1.  The  number  of  organisms  tested  in  each  replicate  of  each 
treatment,  and 

2,  the  number  of  organisms  in  each  replicate  surviving  the 
test  (or  surviving  normally). 

The  data  must  be  inputted  according  to  increasing  order  of  treatments. 

Important .  The  user  supplies  the  input  format  for  the  card  image  of 

the  survival  data  (see  input  card  4  in  Fortran  file  5)  with 
variable  format  statements. 

Concentration  (Dose)  Data: 

In  this  program.  Subroutine  RDCONC  handles  concentration  data 
input.  The  concentration  data  is  read  from  Fortran  file  9.  Each 
input  card  from  this  file  must  supply  the  program  with  the  following 
two  numbers  (in  this  order): 

1.  The  treatment  number,  and 

2.  the  concentration  measurement  corresponding  to  this 
treatment  number. 


Notes  to  the  user: 


1.  The  data  must  be  inputted  in  increasing  order  of  treatments 

2.  The  user  may  input  many  concentration  measurements  for  a 
given  treatment  since  the  program  calculates  an  average 
concentration  for  each  treatment. 

Important.  The  input  format  for  the  card  image  of  the  concentration 
data  is  user-supplied  with  variable  format  statements. 
(See  input  card  6  in  Fortran  file  5). 


Figure  1.  Example  of  Program  Input  Stream. 
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DATA  INPUT 


Values  Used  by  the  Program: 

The  input  cards  are  read  from  Fortran  file  5.  There  must  be  9 
input  cards  in  this  file.  An  example  input  stream  is  illustrated  in 
Figure  1  and  a  sample  deck  suitable  for  card  input  on  the  AMDAHL  470  V6 
computer  is  illustrated  in  cigure  2.  The  following  descriptions  refer 
to  the  numbered  cards  in  Figure  1 . 

Card  #1  is  the  title  card.  The  user  is  to  choose  a  title  or  message 
and  place  it  anywhere  in  the  first  72  columns  of  this  card. 

If  the  user  does  not  wish  to  supply  a  title,  this  first  card 
must  be  blank. 

A  typical  title  is  illustrated  on  the  Fortran  Coding  Form, 
line  1 . 

Card  #2  contains  the  number  of  treatments  and  the  number  of  replicate 
tanks  within  each  treatment.  Suppose  N1  *  number  of  treatments 
and  N2  =  number  of  replicates.  N1  and  N2  must  be  integer 
values. 

If  1  5.  N1  1  9»  Place  N1  in  column  2  of  this  card. 

If  N1  =  10,  place  N1  in  columns  1  and  2  of  the  card. 

If  1  1  N2  9,  place  N2  in  column  4. 

If  N2  *  a  two-digit  number,  place  N2  in  columns  3  and  4. 

(See  program  limitation  1.) 

An  example  of  this  card  is  found  on  Fortran  Coding  Form, 
line  2.  The  numbers  '6'  and  '4'  indicate  that  there  are 
6  treatments  with  4  replicates  in  each  treatment. 
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Card  #  3  supplies  the  program  with  the  number  of  descriptive  cards 


(lines)  at  the  beginning  of  the  mortality  dataset.  (A 
descriptive  card  may  be  any  card  at  the  beginning  of  the 

dataset  which  is  not  data;  e.g.,  a  typical  card  might 

contain  information  about  the  contents  of  the  dataset.) 

The  number  on  this  card  must  be  an  i nteger  from  0  to  99. 
Suppose  NS  =  #  of  descriptive  cards. 

If  0  £  NS  £  9,  place  NS  in  column  2  of  this  card. 

If  10  £  NS  £  99,  place  NS  in  columns  1  and  2. 

Note:  If  the  user  has  no  descriptive  cards  in  the  mortality 
dataset,  place  a  'O'  in  column  2  of  the  card. 

An  example  of  this  card  is  found  on  the  Fortran  Coding  Form, 

line  3.  The  number  '5'  indicates  that  there  are  5  descriptive 
cards  at  the  head  of  the  mortality  dataset. 

Card  #4  contains  the  format  for  the  input  line  image  of  the  mortality 
data.  This  format  must  supply  the  program  with  the  following 
two  values  (in  this  order):  The  number  of  objects  tested 
within  a  given  replicate  of  a  given  treatment,  and  the 
number  of  these  objects  surviving  the  test  (or  surviving 
normally).  Line  4  on  the  Fortran  Coding  Form  contains  a 
typical  variable  format  statement.  This  format  indicates 
to  the  program  to  tabulate  to  column  16  of  the  card,  skip 
15  spaces,  and  read  the  next  2  values  from  the  card. 


Card  #5  contains  the  number  of  descriptive  cards  at  the  head  of  the 


concentration  dataset.  This  card  is  similar  to  Card  #  3. 
Suppose  NC  *  number  of  descriptive  cards. 

If  0  <_  NC  <_  9,  place  NC  in  column  2. 

If  10  £  NC  <_  99,  place  NC  in  columns  1  and  2. 

An  example  of  this  card  is  line  5  on  the  Fortran  Coding  Form. 
The  number  '5'  indicates  there  are  5  descriptive  cards  at 
the  head  of  the  concentration  dataset. 

Card  #6  contains  the  format  for  the  input  image  of  the  concentration 
(dose)  data.  This  card  is  similar  to  Card  #4.  The  format 
of  this  card  must  supply  the  program  with  the  following 
two  numbers  (in  this  order):  an  integer  value  for  the 
treatment  number,  and  a  real  value  for  the  concentration 
measurement  within  that  treatment. 

Note  line  6  on  the  Fortran  Coding  Form.  This  variable 
format  instructs  the  program  to  read  an  integer  value 
from  the  first  2  columns,  tabulate  to  column  30,  and 
read  the  real  value  beginning  in  that  column. 

Card  #  7  contains  the  user's  choices  for  3  parameter  values: 

1.  the  upper  bound  on  the  upward  concavity  region  (UCR), 

2.  the  level  of  significance,  or  alpha  level  (ALEVEL), 

3.  the  value  of  a  flag  indicating  the  user's  desire  for 
simultaneous  confidence  intervals.  (IFLG1) 


/■ 
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The  user  should  place  UCR  in  columns  1-8  with  a  decimal 
point  in  column  4.  This  allows  the  user  to  specify  a 
number  as  large  as  999.9999.  ALEVEL  is  placed  in  columns 
11-16  with  a  decimal  point  in  column  12.  Note  that  this 
value  must  be  a  number  between  0  and  1.  If  the  user 
desires  simultaneous  confidence  intervals,  the  value  of 
IFLG1  must  be  1.  Place  a  'V  in  column  19,  if  this  is  the 
case;  otherwise,  leave  the  column  blank. 

Note  line  7  on  the  Fortran  Coding  Form.  The  number  ’15.0' 
indicates  that  the  upward  concavity  region  lies  between 
0  and  15.0  (0  and  15.0  are  possible  concentration  values.) 

(  The  number  *0.05*  indicates  the  level  of  significance,  and 

the  T  in  column  19  tells  the  program  that  a  simultaneity  adjustm 
is  desired,  (Thus  0.05  is  interpreted  as  the  familywise  error  rat 

Card  #8  contains  the  number  of  values  of  k  to  be  considered  in  the 
analysis,  followed  by  the  actual  k  values.  (Recall  that 
the  safe  concentration  is  assumed  to  lie  between  CQ  and  C^.) 

The  values  given  on  this  card  must  be  integers.  Suppose 
NK  s  number  of  k's  to  be  considered. 

If  1  <_  NK  <  9,  place  NK  in  column  2  of  this  card. 

If  NK  =  10,  place  NK  in  columns  1  and  2. 

Leave  columns  3  and  4  blank.  Columns  5-24  contain  the 
actual  values  of  k.  Each  value  is  allotted  2  columns; 
e.g.,  columns  5  and  6  contain  the  first  value  of  k,  and 
columns  7  and  8  contain  the  second  value,  etc. 
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If  the  value  for  k  is  a  treatment  number  from  1  to  9, 
place  the  value  in  the  rightmost  column  of  the  field. 
Otherwise,  the  value  for  k  would  equal  10  and  both  columns 
are  used  for  this  two-digit  number. 

Note:  NK  must  equal  the  number  of  actual  values  placed 
on  this  card. 

Line  8  on  the  Fortran  Coding  Form  is  a  typical 
example  of  Card  Number  8.  Column  2  contains  the  number 
'3',  indicating  that  3  values  for  k  are  to  be  considered. 
The  numbers  *3',  '4',  '5',  are  these  k  values, 

Card  #9  contains  the  number  of  L's  to  be  considered  followed  by 
the  actual  values  for  L.  (Recall  that  L  represents  the 
incremental  response  rate  over  background,  associated  with 
the  "safe"  concentration. )The  values  on  this  last  card 
are  as  follows:  Columns  1  and  2  contain  an  integer  value 
for  the  number  of  L’s.  Suppose  NL  *  number  of  L's.  If 
1  NL  9,  place  NL  in  column  2.  If  NL  =  10,  place  NL  in 
columns  1  and  2.  Columns  3  and  4  are  to  be  left  blank. 
Beginning  with  column  5,  the  remaining  columns  are  used  to 
specify  the  desired  values  of  L.  Each  value  of  L  uses 
6  columns;  e.g.,  the  first  value  appears  in  columns  5-10 
with  a  decimal  point  in  column  6.  A  decimal  point  must 
be  placed  in  the  second  column  of  a  given  field. 

Note  that  L  must  be  a  number  from  0  to  1.  The  user  is 
allowed  at  most  4  digits  to  the  right  of  the  decimal 
point.  The  example  of  Card  Number  9  is  found  on  the 
Fortran  Coding  Form,  line  9.  A  '3'  appears  in  column  2 
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PROGRAM  LIMITATIONS 


Array  Size  Limitations: 

1.  The  number  of  treatments  may  not  exceed  10  and 

(the  number  of  treatments)  X  (the  number  of  replicates  per 
treatment)  may  not  exceed  40.  For  example,  the  program 
could  handle  as  many  as  10  treatments  with  4  replicates  each, 
or  8  treatments  with  5  replicates  each. 

2.  The  user  can  supply  this  program  with  no  more  than  10  values 
for  k  (input  card  8  in  Fortran  file  5).  Similarly,  the 
user  is  allowed  no  more  than  10  values  for  L  (input  card  9 
in  Fortran  file  5). 

Output  Limitations: 

1.  Only  one  title  card  is  allowed  and  the  user  must  restrict 
his  title  to  the  first  72  columns  of  the  title  card.  (See 
input  card  1.  in  Fortran  file  5.) 

2.  Subroutine  WRITE!  prints  the  datq  summary 
found  on  the  first  page  of  output. 

The  number  of  objects  tested  and  the  number  of 
survivals  are  each  printed  in  format  F6.0.  Therefore,  any 
quantity  greater  than  99999.  will  not  print  correctly. 
Similarly,  the  concentration  for  each  treatment  is  printed 
in  F8.4  format.  Numbers  greater  than  999.9999  will  not 
print  correctly. 


USAMBRDL  U 


DISTRIBUTION  LIST 


25  copies 


4  copies 


12  copies 


1  copy 


1  copy 


1  copy 


Commander 

US  Army  Medical  Bioengineering 

Research  and  Development  Laboratory 
ATTN:  SGRD-UBG 

Fort  Detrlck,  Frederick,  MD  21701 
Commander 

US  Army  Medical  Research  and  Development 
Command 

ATTN:  SGRD-RMS 

Fort  Detrick,  Frederick,  MD  21701 

Defense  Technical  Information  Center  (DTIC) 
ATTN:  DTIC-DDA 
Cameron  Station 
Alexandria,  VA  22314 

Dean 

School  of  Medicine 
Uniformed  Services  University  of  the 
Health  Sciences 
4301  Jones  Bridge  Road 
Bethesda,  MD  20014 

Commandant 

Academy  of  Health  Sciences,  US  Army 
ATTN:  AHS-CDM 
Fort  Sam  Houston,  TX  78234 

Commander 

US  Army  Medical  Bioengineering  Resear'^ 
and  Development  Laboratory 
ATTN:  SGRD-UBD- A/Librarian 
Fort  Detrick,  Frederick,  MD 


